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Preface 


With more than two million users worldwide, R is one of the most popular open source 
projects. It is a free and robust statistical programming environment with very powerful 
graphical capabilities. Analyzing and visualizing data with R is a necessary skill for anyone 
doing any kind of statistical analysis, and this book will help you do just that in the easiest 
and most efficient way possible. 

Unlike other books on R, this book takes a practical hands-on approach and will dive straight 
into creating graphs in R right from the very first page. If you wish to harness the power of this 
mighty open source programming language to visually present and analyze your data in the 
best way possible—this book is going to show you how. 

The R Graphs Cookbook takes a practical approach to teaching how to create effective 
and useful graphs using R. It will demystify a lot of difficult and confusing R functions and 
parameters. It will enable you to construct and modify data graphics to suit your analysis, 
presentation, and publication needs. 

This practical guide begins by teaching you how to make basic graphs in R and progresses 
through subsequent dedicated chapters about each graph type in depth. You will learn all 
about making graphics such as scatter plots, line graphs, bar charts, pie charts, dot plots, 
heat maps, histograms, and box plots. In addition, there are detailed recipes on making 
various combinations and advanced versions of these graphs. Dedicated chapters on 
polishing and finalizing graphs will enable you to produce professional quality graphs for 
presentation and publication. With the R Graphs Cookbook in hand, making graphs in 
R has never been easier. 


What this book covers 

Chapter 1, Basic Graph Functions introduces recipes for some basic types of graphs, useful in 
almost any kind of data analysis. We will go through all the steps to get you going from reading 
your data into R, making a first graph, tweaking it to suit your needs, and then saving and 
exporting it for use in presentations and publications. 
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Chapter 2, Beyond the Basics: Adjusting Key Parameters looks more closely at various 
arguments to graph functions and their values, highlighting common pitfalls and workarounds. 
The pa r () function is explained with some useful examples showing how to adjust colors, 
sizes, margins, and styles of various graph elements such as points, lines, bars, axes, 
and titles. 

The subsequent chapters 3 to 9 cover the graph types introduced in the first two chapters in 
more detail. 

Chapter 3, Creating Scatter Plots has over a dozen recipes covering scatter plots, which are 
some of the simplest and most commonly used type of graphs in data analysis. We will see 
how we can make more enhanced plots by adjusting various arguments and using some 
new functions. 

Chapter 4, Creating Line Graphs and Time Series Charts discusses some more intermediate 
to advanced recipes for customizing line graphs, improving and speeding up line graphs with 
multiple lines, processing dates to make time series charts, sparklines and stock charts. 

Chapter 5, Creating Bar, Dot, and Pie Charts will show you how you can create many useful 
variations of bar graphs and dot plots by using only the base library functions. We will also 
look at a few recipes addressing common criticisms of pie charts with some ways to make 
them more readable. 

Chapter 6, Creating Histograms enhances the basic histogram in R by changing the plotting 
mode and bins, in addition to style adjustments. We will also look at some advanced recipes 
combining histograms with other types of graphs. 

Chapter 7, Creating Box and Whisker Plots looks into various stylistic and structural 
adjustments to box plots. We will start by looking at some basic arguments to change 
individual aspects of a box plot and slowly move to more advanced recipes involving the use 
of multiple function calls. 


Chapter 8, Creating Heat Maps and Contour Plots discusses various types of heat maps 
for visualizing correlations, trends and multivariate data, and contour plots for showing 
topographical information in various two-dimensional and three-dimensional ways. 

Chapter 9, Creating Maps builds on top of the introduction to visualizing data on geographical 
maps in the first chapter, covering recipes for plotting data from the World Bank, World Health 
Organization (WHO), Google Maps API, and some Geographical Information Systems (GIS). 

Chapter 10, Finalizing Graphs for Publications and Presentations discusses some tricks 
and tips to add some polish to our graphs so that they can be used for publication and 
presentation. We will cover many important practical topics such as exported graph file 
formats, high resolution formats, vector formats such as PDF, SVG, and PS, mathematical 
and scientific notations, text descriptions, fonts, graph templates, and themes. 




Preface 


What you need for this book 

The only software needed for this book is R itself, which is available for download for all 
major operating systems at h 11 p: / / c r a n. r-proj ect. o r g. Some additional R packages 
are required, but these can be installed from within R. The instructions are provided in the 
relevant sections of the book. 

You will also need the example datasets, which can be downloaded from the book's 
companion website: https:// www. pac kt pub. com/ r - gr aph - cookbook / book. 


Who this book is for 

This book is for readers already familiar with the basics of R and want to learn the best 
techniques and code to create graphics in R in the best way possible. It will also serve 
as an invaluable reference book for expert R users. 


Conventions 

In this book, you will find a number of styles of text that distinguish between different kinds 
of information. Here are some examples of these styles, and an explanation of their meaning. 

Code words in text are shown as follows:" We will use the base graphics function hi s t () to 
make our histogram." 

A block of code is set as follows: 

hi st (a i r $ Ni t r ogen. Oxi des, 
breaks =20, 

xlab=" Nitrogen Oxide Concentrations", 

ma i n =" Di st r i but i on of Nitrogen Oxide Concentrations") 

When we wish to draw your attention to a particular part of a code block, the relevant lines or 
items are set in bold: 

hi st (a i r $ Ni t r ogen. Oxi des, 

breaks 40, 

xlab=" Nitrogen Oxide Concentrations", 

ma i n =" Di st r i but i on of Nitrogen Oxide Concentrations") 


New terms and important words are shown in bold. Words that you see on the screen, in 
menus or dialog boxes for example, appear in the text like this: "Select an appropriate mirror 
site from the CRAN mirror window." 
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[私 Warnings or important notes appear in a box like this. 


Tips and tricks appear like this. 



Reader feedback 

Feedback from our readers is always welcome. Let us know what you think about this 
book—what you liked or may have disliked. Reader feedback is important for us to develop 
titles that you really get the most out of. 

To send us general feedback, simply send an e-mail to feedback @packtpub. com, and 
mention the book title via the subject of your message. 

If there is a book that you need and would like to see us publish, please send us a note in the 
SUGGEST A TITLE form on www, pac kt pub. c o m or e-mail s u g g es t @packt pub. com. 

If there is a topic that you have expertise in and you are interested in either writing or 
contributing to a book, see our author guide on www, pac kt pub. com/authors. 


Customer support 

Now that you are the proud owner of a Packt book, we have a number of things to help you to 
get the most from your purchase. 



Downloading the example code for this book 
You can download the example code files for all Packt books you have 
purchased from your account at h 11 p : II www, Packt Pub. com. If you 
purchased this book elsewhere, you can visit http:// www. Packt Pub. com/ 
support and register to have the files e-mailed directly to you. 
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Errata 

Although we have taken every care to ensure the accuracy of our content, mistakes 
do happen. If you find a mistake in one of our books—maybe a mistake in the text or the 
code—we would be grateful if you would report this to us. By doing so, you can save other 
readers from frustration and help us improve subsequent versions of this book. If you find 
any errata, please report them by visiting ht t p: / / www.packtpub.com/support, 
selecting your book, clicking on the errata submission form link, and entering the details 
of your errata. Once your errata are verified, your submission will be accepted and the 
errata will be uploaded on our website, or added to any list of existing errata, under the 
Errata section of that title. Any existing errata can be viewed by selecting your title from 
http:// www.packtpub.com/support. 


Piracy 

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, 
we take the protection of our copyright and licenses very seriously. If you come across any 
illegal copies of our works, in any form, on the Internet, please provide us with the location 
address or website name immediately so that we can pursue a remedy. 

Please contact usatcopyri ght@packtpub. com with a link to the suspected 
pirated material. 

We appreciate your help in protecting our authors, and our ability to bring you 
valuable content. 

Questions 

You can contact us at q u e s t i ons@packtpub. com if you are having a problem with any 
aspect of the book, and we will do our best to address it. 





Basic Graph 
Functions 


In this chapter, we will cover the following recipes: 

► Creating scatter plots 

► Creating line graphs 

► Creating bar charts 

► Creating histograms and density plots 

► Creating box plots 

► Adjusting X and Y axis limits 

► Creating heat maps 

► Creating pairs plots 

► Creating multiple plot matrix layouts 

► Adding and formatting legends 

► Creating graphs with maps 

► Saving and exporting graphs 


Introduction 


In this chapter, we will see how to use R to make some very basic types of graphs, which are 
likely to be used in almost any kind of analysis. The recipes in this chapter will give you a feel 
for how much can be accomplished with very little R code, which is one big reason why R is a 
good choice for an analysis platform. 
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Although the examples in this chapter are of a basic nature, we will go through all the steps 
to get you going from reading your data into R, making a first graph, tweaking it to suit your 
needs, and then saving and exporting it for use in presentations and publications. 

First and foremost, you need to download and install R on your computer. All R packages are 
hosted on the Comprehensive R Archive Network or CRAN (ht t p: / / c r a n. r-proj ect. 
or g /). R is available for all the three major operating systems at the following locations on 
the web: 

► Windows: ht t p: / / cr an. r-proj ect. or g/ bi n/wi ndows/ base/ 

► Linux: http://cran. r-proj ect. or g/ bi n/1 i nux/ 

► Mac OS X: h 11 p: / / c r a n. r-proj ect. or g/ bi n/macosx/ 


Please read the FAQs (h 11 p: / / c r a n. r-proj ect. or g/ f aqs. ht ml ) and manuals[] 

(ht t p: / / cr an. r-proj ect. o r g / ma n u a I s. ht ml ) on the CRAN site for detailed help 
on installation. 

Just having the base installation of R should set you up for all the recipes in this book. 

Please note that the R code in this book has some comments explaining the code. Any text 
on a line following the # symbol is treated by R as a comment. For example, you may see 
something like this: 

col =" y el I ow" #Set t i n g the color to yellow 

As you can see clearly, the text after the # explains what the code is doing. Setting the color 
to yellow in this case. Comments are a way of documenting code so that others reading your 
code can understand it better. It also serves to help you and you can also understand your 
code better when you come back to it after a long period of time. Please read each line of 
code carefully and look out for any comments that will help you understand the code better. 


Creating scatter plots 


This recipe describes how to make scatter plots using some very simple commands. We'll 
go from a single line of code, which makes a scatter plot from pre-loaded data, to a script of a 
few lines that produces a scatter plot customized with colors, titles, and axes limits specified 
by us. 


Getting ready 


All you need to do to get started is start R. You should have the R prompt on your screen 
as shown in the following screenshot: 
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Let's use one of R's inbuilt datasets called cars to look at the relationship between the speed 
of cars and the distances taken to stop (recorded in the 1920s). 


To make your first scatter plot, type the following command at the R prompt: 


pi ot(car s$di st-cars$speed) 


This should bring up a window with the following graph showing the relationship between the 
distance travelled by cars plotted with their speeds: 
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Basic Graph Functions - 

Now, let's tweak the graph to make it look better. Type the following code at the R prompt: 
pi ot(cars$di s 卜 cars$speed, # y~x 

ma in =" Relationship bet ween car distance & speed", # Plot Title 

xl ab=" Speed (miles per hour)", #X axis title 

yl ab=" Di stance travelled (miles) 11 , # 丫 axis title 

xl i m=c (0,30), #Set x axis limits from 0 to 30 

yl i m=c (0,140), #Set y axis limits from 0 to 140 

xaxs =" i ", #Set x axis style as internal 

y a xs =" i ", #Set y axis style as internal 

col =" red", #Set the color of plotting symbol to red 

pc h = 19) #Set the plotting symbol to filled dots 

This should produce the following result: 
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R comes preloaded with many datasets. In the example, we used one such dataset called 
cars, which has two columns of data, with the names speed and d i s t. To see the data, 
simply type cars at the R prompt and press Enter: 

>c ar s 

speed di st 

1 4 2 

2 4 10 

3 7 4 

4 7 2 2 

47 24 92 
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As the output from the R command line shows, the cars dataset has two columns and 50 
rows of data. 

The p I ot ( ) command is the simplest way to make scatter plots (and other types of plots as 
we'll see in a moment). 

In the first example, we simply pass the x and y arguments that we want to plot in the form 
pi o t (y 〜 x) that is, we want to plot distance versus speed. This produces a simple scatter 
plot. In the second example, we pass a few additional arguments that provide R with more 
information on how we want the graph to look. 

The ma i n argument sets the plot title, x I a b and y I a b set the X and Y axes titles respectively, 
xl i m and yl i m set the minimum and maximum values of the labels on the X and Y axes 
respectively, xaxs and yaxs set the style of the axes, c o I and pc h set the scatter plot 
symbol color and type respectively. All of these arguments and more will be explained in 
detail in Chapter 2, Beyond the Basics. 


There's more... 


Instead of the p I ot (y 〜 x) notation used in the preceding examples, you can also use 
pi ot ( x, y) . For more details on all the arguments the p I ot ( ) command can take, see the 
help documentation by typing ?pl otorhel p(pl ot) at the R prompt, after plotting the first 
dataset with pi ot (). 

If you want to plot another set of points on the same graph, say from another dataset or the 
same data points but with another symbol on top, you can use the points!) function: 

poi nts(cars$di s 卜 cars$speed, pch=3) 

A note on R f s inbuilt datasets 

In addition to the cars dataset used in the example, R has many more datasets, which come 
as part of the base installation in a package called datasets. To see the complete list of 
available datasets, call the d at a ( ) function simply by running it at the R prompt: 

data)) 


See also 


Scatter plots are covered in a lot more detail in Chapter 3, Creating Scatter Plots. 






Line graphs are generally used for looking at trends in data over time, so the X variable is 
usually time expressed as time of the day, date, month, year, and so on. In this recipe, we will 
see how we can quickly plot such data using the same pi ot () function, which was used in 
the previous recipe to make scatter plots. 


First we need to load the dai I y s a I es. csv example data file. You can download this file 
from the code download section of the book's companion website: 

sales <-read.csv( 11 dai lysales.csv", header =T RUE) 

As the file name suggests, it contains daily sales data of a product. It has two columns: a date 
column and a sales column showing the number of units sold. 


Here's the code to make your first line graph: 


pi ot (s al es $uni t s -as 
type=" I 11 , #Speci f y 
mai n=" Uni t Sal es in 
xl ab="Date", 

y I ab=" Number of units sol d 11 
col ="bl ue") 


Dat e( s al es $dat e, " %d / %m/ %y 11 ), 
ype of plot as I for line 
the mo nth of January 2010", 
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Creating line graphs 


do it... 


Getting ready 


000Zooogooosoootr 
plos s-tsiioJaqEnN 


Date 




















Chapter 1 



We first read the data file using the read, cs v( ) function. We passed two arguments to 
the function: the name of the file we want to read (dailysales.csvin double quotes) and 
with header =T RUE we specified that the first row contains column headings. We read the 
contents of the file and saved it in an object called sales with the left arrow notation. 

You must have noticed that the plotting code is quite similar to that for producing a scatter 
plot. The main difference is that this time we passed the type argument. The type argument 
tells the pi ot () function whether you want to plot points, lines, or other symbols. It can take 
nine different values. 


Please see the help section on p I ot ( ) for more details. The default 
value of type is " p" as in points. 

If the type is not specified R assumes you want to plot points as it did in the scatter 
plot example. 



The most important part of the example is the way we read the date using the as. Dat e() 
function. Reading dates in R is a bit tricky. R doesn't automatically recognize date formats. 
The as. Da t e () function takes two arguments: the first is the variable which contains the 
date values and the second is the format the date values are stored in. In the example, the 
dates are in the form date/month/year or dd/mm/yyyy, which we specified as %d / %m/ %y in 
the function call. If the date was in mm/dd/yyyy format, we'd use %m/ %d / %y. 


The plot and axes titles and line color are set using the same arguments as for the scatter plot. 


There's more... 


If you want to plot another line on the same graph, say daily sales data of a second product, 
you can use the lines)) function: 

Ii nes(s a Ies$units2~as_ Dat e(sal es $dat e, "%d / %m/ %y"), 
col'red") 


See also 


Line graphs and time series charts are covered in depth in Chapter 4, Creating Line Graphs 
and Time Series Plots. 
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Creating bar charts 


In this recipe, we will learn how to make bar plots, which are useful for visualizing summary 
data across various categories, such as sales of products or results of elections. 


Getting ready 


First we need to load the c i t y s a I es. c s v example data file. You can download this file from 
the code download section of the book's companion website: 

sal es <- read, cs v( 11 c i t ysal es. csv", header =T RUE) 



Just like the pi ot ( ) function we used to make scatter plots and line graphs in the earlier 
recipes, the ba r pi ot () and dot c ha r t ( ) functions are part of the base graphics library 
in R. This means that we don't need to install any additional packages or libraries to use 
these functions. 


We can make bar plots using the bar pi ot ( ) function as follows: 

barplot(sales$ProductA ( 
na mes. ar g = s al es $Ci t y ( 
col ="bl ack") 
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The default setting of orientation for bars is vertical. To change the bars to horizontal, use the 
ho r i z argument (by default, it is set to F A L S E): 

barplot(sales$ProductA, 
n a me s. a r g = sales$City, 
hori z=TRUE, 
col ="bl ack") 


The first argument of the bar pi ot ( ) function is either a vector or matrix of values which you 
want to plot as bars, such as the sales data variables in the examples we have just seen. The 
labels for the bars are specified by the n a me s. a r g argument, but we use this argument only 
when plotting single bars. In the example with sales figures for multiple products, we didn't 
specify n a me s. a r g. R automatically used the product names as the labels and we had to 
instead specify the city names as the legend. 




As with the other types of plots, the c o I argument is used to specify the color of the bars. 

This is a common feature throughout R, that is c o I is used to set the color of the main feature 
in any kind of graph. 
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There's more... 


Bar plots are often used to compare the values of groups of values across categories. 
For example, we can plot the sales in different cities for more than one product using 
the beside argument: 

barplot(as.matrix(sales[ ( 2:4]), beside =TRUE, 

I egend=sal es$Ci t y, 
col =heat■col or s(5), 
border =__ whi t e") 


■ 

Seattle 

m 

London 

□ 

Tokyo 

□ 

Berlin 

□ 

Mumbai 



ProductA ProductB ProductC 


You will notice that when plotting data for multiple products (columns), we used the square 
bracket notation in the form sal es[ , 2: 4] . In R the square bracket notation is used to refer 
to specific columns and rows of a dataset. For example, sal es [ 2, 3 ] refers to the value in 
the second row and the third column. 

So the notation is of the form s a I e s [ r o w, c o I u mn ] . If you want to refer to all the rows in a 
certain column you can omit the row number. For example, if you want to refer to all the rows 
in column two, you would use s a I es [ , 2 ] . Similarly, for all the columns of row three, you 
would use sal es [ 3,]. 
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So s a I e s [ , 2 : 4 ] refers to all the data in columns two to four, which is the product sales data 
as shown in the following table: 


City 

ProductA 

ProductB 

ProductC 

San Francisco 

23 

11 

12 

London 

89 

6 

56 

Tokyo 

24 

7 

13 

Berlin 

36 

34 

44 

Mumbai 

3 

78 

14 


The orientation of bars is set to vertical by default. It is controlled by the optional hor i z 
(for horizontal) argument. If we do not use this argument in our bar pi ot ( ) function call, 
it is set to F A L S E. To make the bars horizontal, we set hori z to TRUE. 

The beside argument is used to specify whether we want the bars in a group of data to be 
stacked or adjacent to each other. By default, beside is set to F A L S E, which produces a 
stacked bar graph. To make the bars adjacent, we set b e s i d e to T R U E. 


To change the color of the border around the bars, we used the border argument. The 
default border color is black. But if you wish to use another color, say white, you can set 
it with bor der =" whi t e". 


To make the same graph with horizontal bars we would type: 

barplot(as.matrix(sales[,2:4]), beside =TRUE, 

I egend=sal es$Ci t y, 
col =heat.col or s(5), 
border ="whi t e", 
hori z=TRUE) 


See also 


Bar charts will be explored in a lot more detail with some advanced recipes in Chapter 5, 
Creating Bar, Dot, and Pie Charts. 


Creating histograms and density plots 


In this recipe, we will learn how to make histograms and density plots, which are useful to look 
at the distribution of values in a dataset. 






How to do it... 
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The h i s t ( ) function is also a function of R's base graphics library. It takes only one 
compulsory argument, that is the variable whose distribution of values we wish to visualize. 

In the first example, we passed the r no r m( ) function as the variable, r nor m( 1 0 0 0 ) 
generates a vector of 1,000 random numbers with a normal distribution. As you can see 
in the histogram, it's a bell-shaped curve. 

In the second example, we passed the inbuilt islands dataset (which gives the areas of 
the world's major landmasses) as the argument to h i s t () . As you can see from that 
histogram, islands has a distribution skewed heavily towards the lower value range 
of 0 to 2,000 square miles. 


s 301.0 0.0 
Alwuea 
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Concentration in ng per cubic metre 
Metal Concentrations in London") 


See also 


We will cover more details such as setting the breaks, density, formatting of bars and other 
advanced recipes in Chapter 6, Creating Histograms. 


Creating box plots 


In this recipe, we will learn how to make box plots, which are useful in comparing the spread 
of values in different measurements. 


Getting ready 


First we need to load the me t a I s. c s v example data file, which contains measurements 
of metal concentrations in London's air. You can download this file from the code download 
section of the book's companion website: 

met a I s <- read, c s v(" met a I s. c s v", header =TRUE) 


We can make a box plot to summarize the metal concentration data using the boxpl ot () 
command as follows: 
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Then we can add the following code: 

boxplot(copper $Cu~copper $Sou 
xl ab=" Meas ur ement Site", 
ylab=" At mo spheric Cone ent r at 
ma i n =" At mos pher i c Copper Con 


「 c e, 


i on of Copper in ng per cubic metre", 
c e nt r atio ns in London") 



The main argument a boxpl ot ( ) function takes is a set of numeric values (in the form of 
a vector or data frame). In our first example, we used a dataset containing numerical values 
of air pollution data from London. The dark line inside the box for each metal represents the 
median of values for that metal. The bottom and top edges of the box represent the first and 
third quartiles respectively. Thus, the length of the box is equal to the interquartile range (IQR, 
difference between first and third quartiles). The maximum length of a whisker is a multiple of 
the IQR (default multiplier is approximately 1.5). The ends of the whiskers are at data points 
closest to the maximum length of the whisker. 

All the points lying beyond these whiskers are considered outliers. 

As with most other plot types, the common arguments such as x I a b, y I a b, and ma i n can be 
used to set the titles for the X and Y axes and the graph itself respectively. 


There's more... 


We can also make another type of box plot where we can group the observations by 
categories. For example, if we want to study the spread of copper concentrations by the 
source of the measurements, we can use a formula to include the source. First we need to 
read the copper_si t e. c s v example data file, as follows: 

copper <-read. csv( 11 copper si te. csv", header =T RUE) 


Atmospheric Copper Concentrations in London 
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In this recipe, we will learn how to adjust the X and Y limits of plots, which is useful in 
adjusting a graph to suit one's presentation needs and adding additional data to the 
same plot. 


We will modify our first scatter plot example to demonstrate how to adjust axes limits: 



Adjusting X and Y axes limits 


How to 


In this example, the boxpl ot ( ) function takes a formula as an argument. This formula in the 
form va I ue 〜 group (Cu 〜 source) specifies a column of values and the group of categories it 
should be summarized over. 


More detailed box plot recipes will be presented in Chapter 7, Creating Box and Whisker Plots. 
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In our original scatter plot in the first recipe of this chapter, the x axis limits were set to 
just below 5 and up to 25 and the y axis limits were set from 0 to 120. In this example, we 
set the x axis limit to 0 to 30 and y axis limits to 0 to 150 using the x I i m and yl i m 
arguments respectively. 

Both x I i m and yl i m take a vector of length 2 as valid values in the form 
c ( mi ni mu m, maxi mu m) that is，x I i m=c (0, 30) means set the x axis minimum 
limit to 0 and maximum limit to 30. 


There's more... 


You may have noticed that even after setting the x and y limit values, there is some gap left at 
either edges. The two axes zeroes don't coincide. This is because R automatically adds some 
additional space at both the edges of the axes, so that if there are any data points at the 
extremes, they are not cut off by the axes. If you wish to set the axes limits to exact values, 
in addition to specifying x I i m and yl i m, you must also set the x a x s and y a x s arguments 
to 11 i 11 : 


xaxs=" i ", 
yaxs=" i ") 
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Sometimes, we may wish to reverse a data axis, say to plot the data in descending order along 
one axis. All we have to do is swap the minimum and maximum values in the vector argument 
supplied as x I i m or y I i m. So, if we want the X axis speed values in the previous graph in 
descending order we need to set x I i m to c ( 3 0, 0): 

pi ot(cars$di st-cars$speed, 
xl i m=c(30, 0), 
yl i m=c (0,150), 


There will be a few more recipes on adjusting the axes tick marks and labels in Chapter 2, 
Beyond the Basics. 


Heat maps are colorful images, which are very useful for summarizing a large amount of data 
by highlighting hotspots or key trends in the data. 








do it... 
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There are a few different ways to make heat maps in R. The simplest is to use the h e a t ma p () 
function in the base library: 


h e a t ma p (a s. ma t r i x ( mt c a r s), 

Rowv=NA, 

Col v=NA, 

col = heat.col or s( 2 5 6 ), 
seal e="col umn 11 , 
mar gi ns=c( 2, 8), 

main = "Car characteristics by Model 11 ) 


Car characteristics by Model 



Vblvo 142E 
Bor? 

Fenari Dino 
Fond Parrtera L 
Lotus Europa 
Porsche 914-2 
Fiat XI-9 
Pontiac Firebird 
Camaro Z28 
/5MC Javelin 
Dodge Challenger 
Toyota Corona 
Toyota Corolla 
Honda Civic 
Fiat 128 

Chrysler Imperial 
Lincoln Continental 
Cadillac Fleetwood 
Merc 460 SLC 
Merc 460SL 
Merc 460SE 
Merc 280C 
Merc 280 
Merc 230 
Merc 240D 
Duster 360 
Valiant 

Hornet Sportabout 
Hornet 4 Drive 
Datsun 710 
Mazda RX4 W^g 
Mazda RX4 
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The example code has a lot of arguments, so it may look difficult at first sight. But if we 
consider each argument in turn, we can understand how it works. The first argument to 
the h e a t ma p ( ) function is the dataset. We are using the inbuilt dataset mt c a r s, which 
holds data such as fuel efficiency (mpg), number of cylinders (cyl ), weight (wt), and so 
on for different models of cars. The data needs to be in a matrix format, so we use the 
as. mat r i x( ) function. Ro wv and Col v specify if and how dendrograms should be 
displayed to the left and top of the heat map|] 



See hel p(dendrogram) and http://en.wi kipedia.org/ 
wi k i / Dendrogram for details on dendrograms. 



In our example, we suppress them by setting the two arguments to NA, which is a logical 
indicator of a missing value in R. The scale argument tells R in what direction the color 
gradient should apply. We have set it to column, which means the scale for the gradient 
will be calculated on a per-column basis. 


There's more... 


Heat maps are very useful for looking at correlations between variables in a large dataset. 
For example, in bioinformatics, heat maps are often used to study the correlations between 
groups of genes. 

Let's look at an example with the genes. c s v example data file. Let's first load the file: 

genes<-read, csv("genes.csv", header =T) 

Let's use the i ma g e () function to create a correlation heat map: 
rownamesf genes) <- col names(genes) 

i mage( x=l: ncol (genes), 
y=l: nrow(genes), 
z =t (as. mat ri x(genes)), 
axes =FALSE, 
xl ab=" 11 , 
yl ab= M 11 , 

main: 11 Gene Cor relation Matrix") 

axi s (1, at =1: ncol (genes), I abel s=col names(genes), col =" whi te", 
las=2,cex.axis=0.8) 

axi s(2, at =1: nr ow( genes), I abel s =r ownames (genes), col = 11 whi t e 11 ( 
las =l,cex.axis =0.8) 
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Gene Correlation Matrix 


Gene20 

Gene19 

Gene18 

Gene17 
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Gene14 
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Genel 



We have used a few new commands and arguments in this example, especially for formatting 
the axes. We will discuss these in detail starting in Chapter 2, Beyond the Basics and with 
more examples in later chapters. 


Heat maps will be explained in a lot more detail with more examples in Chapter 8, Creating 
Heat Maps. 


A pairs plot is a matrix of scatter plots which is a very handy visualization for quickly scanning 
the correlations between many variables in a dataset. 


Creating pairs plots 


See also 
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How it works... 


As you can see in the figure, the pairs!) command makes a matrix of scatter plots, where 
all the variables in the specified dataset are plotted against each other. The variable names, 
displayed in the diagonal running across from the top left to the bottom right, are the key to 
reading the graph. For example, the scatter plot in the first row and second column shows the 
relationship between Sepal Length on the Y axis and Sepal Width on the X axis. 
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There's more... 


Here's a fun fact: we can produce the previous graph using the pi ot ( ) function instead of 
pairs!) in exactly the same manner: 

pi ot (i r i s [ , 1: 4], 

ma in =" Relationships between characteristics of iris flowers", 
pc h = 19, 
col ="bl ue", 
cex=0. 9) 


Relationships between characteristics of iris flowers 
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So if you pass a data frame with more than two variables to the pi ot () function, it creates a 
scatter plot matrix by default. We've also added a plot title and modified the plotting symbol 
style, color and size using the pch,col and c ex arguments respectively. We'll delve into the 
details of these settings in Chapter 2, Beyond the Basics. 
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jke a 2x3 matrix of graphs, made of two rows and three columns of 
()command as follows: 


Let s say we want to me 
graphs. We use the pa r 

par ( mf row=c( 2, 3 
pi ot(r nor m( 100) 
pi ot (r nor m( 100) 
pi ot (r nor m( 100) 
pi ot (r nor m( 100) 
pi ot (r nor m( 100) 
pi ot (r nor m( 100) 


See also 


We'll cover some more interesting recipes in Chapter 3, Creating Scatter Plots, building upon 
the things we learn in Chapter 2. 


Creating multiple plot matrix layouts 


In this recipe, we will learn how to present more than one graph in a single image. Pairs plots 
are one example as we saw in the last recipe, but here we will learn how to include different 
types of graphs in each cell of a graph matrix. 


do it... 
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The pa r () command is by far the most important function for customizing graphs in R. It is 
used to set and query many graphical arguments (hence par), which control the layout and 
appearance of graphs. 

Please note that we need to issue the par ( ) command before the actual graph commands. 
When you first run the pa r () command, only a blank graphics window appears. The pa r () 
command sets the argument for any subsequent graphs made. The mf r o w argument is used 
to specify how many rows and columns of graphs we wish to plot. The mf r o w argument takes 
values in the form of a vector of length two: cfnrow, ncol). The first number specifies the 
number of rows and the second specifies the number of columns. In our previous example, 
we wanted a matrix of two rows and three columns, so we set mf r o w to c ( 2, 3). 

Note that there is another argument mf c o I , similar to mf r o w, which can also be used to 
create multiple plot layouts, mf c o I also takes a two value vector specifying the number of 
rows and columns in the matrix. The difference is that mf c ol draws subsequent figures by 
columns, rather than by rows as mf r o w does. So, if we used mf c o I instead of mf r o w in the 
earlier example, we would get the following plot: 


OOIOE-IOCJ 


(ST—EJOUJ 



/-\0CH)iE 


(ool-)E-IOUJ 
































Basic Graph Functions 


There's more... 


Let's look at a practical example where a multiple plot layout would be useful. Let's read the 
dailymarket.csv example file that contains data on the daily revenue, profits, and number 
of customer visits for a shop: 

market <- read. csv(" dai I ymar ket. csv", header=TRUE) 

Now, let's plot all the three variables over time in a plot matrix with the graphs stacked over 
one another: 

pa r ( mf r ow=c ( 3, 1)) 

pI ot ( mar ket $r evenue-as. Dat e( ma r ket $dat e, " %d / %m/ %y"), 

t y pe=" I ", #Spec i f y type of plot as I for line 

mai n=" Revenue", 

xl ab=" Dat e", 

yl ab="US DolIars", 

col ="bl ue") 

pi ot ( ma r ket $ pr of i t s 〜 as. Date) ma r ket $ da t e, " %d / %m/ %y"), 

t y pe=" I ", #Specify type of plot as I for line 

mai n=" Pr of i t s", 

xl ab=" Dat e", 

yl ab="US DolIars", 

col ="red") 

pi ot ( ma r ket $ c us t omer s -as. Dat e( ma r ket $ da t e, 11 %d / %m/ %y 11 ), 
t y pe=" I ", #Spec i f y type of plot as I for line 

mai n=" Cust omer visits", 
xl ab=" Dat e", 

yl a b=" Number of people 11 , 
col ="bl ack") 


—[32 
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The preceding graph is a good way to visualize variables with different value ranges over 
the same time period. It helps in identifying where the trends match each other and where 
they differ. 


We will explore more examples and uses of multiple plot layouts in later chapters. 


In this recipe, we will learn how to add and format legends to graphs. 


First we need to load the cityrain.csv example data file, which contains monthly rainfall 
data for four major cities across the world. You can download this file from the code download 
section of the book's companion website: 


Getting ready 


Adding and formatting legends 


See also 



r a i n <- read, c s v("ci t y r ai n.csv", header =TRUE) 
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Monthly Rainfall in major cities 
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In the bar plots recipe, we already saw that we can add a legend by passing the I egend 
argument to the b a r p I ot () function. Now we see how we can use the I e g e nd () 
function to add and customize a legend for any type of graph. 

Let's first draw a graph with multiple lines representing the rainfall in cities: 


in ma j o r cities' 


pi ot (r ai n$Tokyo, t ype=" I " ( col =" red" 
yl i m=c ( 0, 3 0 0 ), 
ma i n =" Mont hi y Ra i nf a I I 
xl ab=" Month of Year", 
yl ab=" Rai nf all ( mm)", 

Iwd=2) 

I 
I 
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n$NewYor k, t ype = n I 11 , col ="bl ue", 
n $London, t y pe =" I 11 , c ol =" gr een", 
n $Ber I i n, t y pe =" I ", c ol =" or a nge" 


Iwd=2) 

Iwd=2) 

,I wd =2) 


Now let's add the legend to mark which line represents which city: 
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How it works... 


In the example code, we first created a graph with multiple lines using the pi ot () and 
lines!) commands to represent the monthly rainfall in Tokyo, New York, London, and Berlin 
in four different colors. However, without a legend one would have no way of telling which line 
represents which city. So we added a legend using the legend)) function. 

The first argument to the I egendf ) function is the position of the legend, which we set to 
t opr i g ht. Other possible values are " t opI ef t", 11 1 op 11 ," I ef t 11 ," c ent er"," r i ght 11 , 

" b o 11 o ml e f t"," b o 11 o m" , and 11 b o 11 o mr i g h t 11 . Then we specify the legend labels by 
setting the I egend argument to a vector of length 4 containing the names of the four cities. 
The c o I argument specifies the colors of the legend, which should match the colors of the 
lines in exactly the same order. Finally, the line type and width inside the legend are specified 
by I t y and I wd respectively. 


There's more... 


The placement and look of the legend can be modified in several ways. As a simple example, 
let's spread the legend across the top of the graph instead of the top right corner. So first, let's 
redraw the same base plot: 

pi ot (r ai n$Tokyo, type= n I ", col =" r ed", 
yl i m=c( 0, 2 5 0 ), 

mai n=" Mont hi y Rainfall in major cities", 
xl ab=" Month of Year", 
yl ab=" Rai nf al I ( mm) , 

Iwd=2) 

Ii nes(rai n$NewYork,type = " I ", col =" bl ue", I wd=2) 

Ii nes(rai n $L on do n,t y pe ="I",c oI ="g r een",I wd =2) 

I i nes ( rai n $Ber I i n, t y pe =" I ", c oI =" or a nge", I wd =2) 
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Now, let's add a modified legend: 


Monthly Rainfall in major cities 


Tokyo 



6 8 
Month ofYear 


12 


I e g e n d ( 11 1 o p 11 , 

I egend=c( 11 Tokyo 11 ( 11 NewYor k 11 , 11 London", 11 Ber I i n 11 ), 
ncol =4, 
cex=0. 8, 
bt y = " n", 

col =c( "red", M bl ue M , 11 green", "orange"), 

It y = 1,I wd=2) 


We changed the legend location from t opr i ght to top and added a few other arguments 
to adjust the look. The ncol argument is used to specify the number of columns over which 
the legend is displayed. The default value is 1 as we saw in the first example. In our second 
example, we set ncol to 4 so that all the city names are displayed in one single row. The 
argument bt y specifies the type of box drawn around the legend. We removed it from the 
graph by setting it to 11 n 11 . We also modified the size of the legend labels by setting c e x 
to 0. 8. 


There are plenty of examples of how you can add and customize legends in different 
scenarios in later chapters. 


See also 
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Chapter 1 


Creating graphs with maps 


In this recipe, we will learn how to plot data on maps. 


Getting ready 


In order to plot maps in R, we need to install the maps library. Here's how to do it: 
i n s t a I I . p a c k a g e s (" ma p s 11 ) 

When you run this command, you will most likely be prompted by R to choose from a list of 
locations from where you can download the library. For example, if you are based in the UK, 
you can choose either the UK (Bristol) or UK (London) options. 

Once the library is installed, we must load it using the I i b r a r y ( ) command: 

I i b r a r y ( ma p s) 



Note that we need to install any package using install, packages! ) only 
once but need to load it using library)) or r e q u i ref) every time we 
restart a new session in R. 
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Let's add color: 


map( 1 worI d 1 , fill = TRUE,col =heat.col or s(10)) 




The ma p s library provides a way to project world data on to a low resolution map. It is also 
possible to make detailed maps of the United States. For example, we can make a map 
showing the state boundaries as follows: 

map( " st at e", interior = FALSE) 

ma p ( " s t a t e 11 , boundary = FALSE, col =" r ed", add = TRUE) 



The add argument is set to TRUE in the second call to ma p ( ) to add details to the same 
map created using the first call. It only works if a map has already been drawn on the current 
graphic device. 




























There's more... 
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The previous examples are just a basic introduction to the idea of geographical visualization 
in R. In order to plot any useful data, we need to use a better maps library. GADM 
(ht t p: / / g ad m, o r g) is a free spatial database of the location of the world's administrative 
areas (or administrative boundaries). The site provides map information as native R objects 
that can be plotted directly with the use of the s p library. 

Let's take a look at a quick example. First we need to install and load the s p library, just like 
we did with the maps library: 

i n s t a I I • p a c k a g e s (__ s p ■_) 

Ii br ar y(sp) 

GADM provides data for all the countries across the world. Let's load the data for Great 
Britain. We can do so by directly reading the data from the GADM website: 

I oad( ur I ( 11 h11 p: / / gadm. or g/ dat a/ r da/ GBR_adml. RDat a 11 )) 

This command loads the boundary data for the group of administrative regions forming Great 
Britain. It is stored in memory as a data object named gadm. Now let's plot a map with the 
loaded data: 

s ppl ot(gadm, 11 S h a p e _ A r e a 11 ) 



The graph shows the different parts of Great Britain, color coded by their surface areas. We 
could just as easily display any other data such as population or crime rates. 
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See also 


We will cover more detailed and practical recipes with maps in Chapter 9, Creating Maps. 


Saving and exporting graphs 


In this recipe, we will learn how to save and export our graphs to various useful formats. 



To save a graph as an image file format such as PNG, we can use the p ng ( ) command: 

png( "s ca11 er pI ot. png") 
pi ot (r nor m( 1 0 0 0 )) 
dev. of f () 


The preceding command will save the graph as s c a 11 e r p I ot. png in the current working 
directory. Similarly, if we wish to save the graph as JPEG, BMP or TIFF we can use the j peg () ， 
bmp( ) , ort i f f ( ) commands respectively. 


If you are working under Windows, you can also save a graph using the graphical user 
interface. First make your graph, make sure the graph window is the active window by clicking 
anywhere inside it and then click on File | Save as ■ Png or the format of your choice as 
shown in the following screenshot: 


RG 


File History Resize Windows 


Save as ► 

Metafile... 

Postscript.. 

PDF... 

Png... 


Copy to the clipboard ► 

Print.. CTRL+P 


close Device 

Bmp... 

厂 ■ 

TIFF... 

Jpeg ► 


When prompted to choose a name for your saved file, type a suitable name and click Save. 
As you can see, you can choose from 7 different formats. 


























If you wish to use code to save and export your graphs, it is important to understand how 
the code works. The first step in saving a graph is to open a graphics device suitable for the 
format of your choice before you make the graph. For example, when you call the p n g () 
function, you are telling R to start the PNG graphics device, such that the output of any 
subsequent graph commands you run will be directed to that device. By default, the display 
device on the screen is active. So any graph commands result in showing the graph on your 
screen. But you will notice that when you choose a different graphics device such as p n g (), 
the graphs don't show up on your screen. Finally, you must close the graphics device with the 
dev. of f () command to instruct R to save the graph you plotted in the specified format and 
write it to disk with the specified filename. If you do not run dev. of f ( ) , the file will not 
be saved. 


There's more... 


You can specify a number of arguments to adjust the graph as per your needs. The simplest 
one that we've already used is the filename. You can also adjust the height and width settings 
of the graph: 


png("scatterplot.png 
height =6 0 0, 
wi dt h=6 0 0 ) 


The default units for height and width are pixels but you can also specify the units in inches, 
cm or mm: 


png("scatterplot.png 
hei ght =4, 
wi dt h=4, 
uni t s=" i n") 


The resolution of the saved image can be specified in dots per inch (dpi) using the 
res argument: 

p n g ( 11 s c a 11 e r p I o t. p n g 11 , 
r es =6 0 0 ) 


If you want your graphs saved in a vector format, you can also save them as a PDF file using 
the pdf ( ) function: 


pdf ("scatterpl ot. pdf 11 ) 
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Besides maintaining a high resolution of your graphs independent of size，PDFs are also 
useful because you can save multiple graphs in the same PDF file. 


See also 


We will cover the details of saving and exporting graphs, especially for publication and 
presentation purposes in Chapter 10. 
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Beyond the Basics: 
Adjusting Key 
Parameters 


In this chapter, we will cover: 

► Setting colors of points, lines, and bars 

► Setting plot background colors 

► Setting colors for text elements: axis annotations, labels, plot titles, and legends 

► Choosing color combinations and palettes 

► Setting fonts for annotations and titles 

► Choosing plotting point symbol styles and sizes 

► Choosing line styles and width 

► Choosing box styles 

► Adjusting axis annotations and tick marks 

► Formatting log axes 

► Setting graph margins and dimensions 


Introduction 


In this chapter, we will learn about some of the simplest yet most important settings and 
parameters of graphs in R base graphics. Learning how to adjust colors, sizes, margins, and 
styles of various graph elements such as points, lines, bars, axes, and titles will give us the 
ability to improve upon the basic graph commands we learnt in Chapter 1. 
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In the previous chapter, we got a glimpse of the different types of graphs that can be made 
in R using small snippets of code. Now, we will learn how to modify the fundamental building 
blocks of those graphs to better suit our needs. 

The R base library has very powerful graphical capabilities. While you can produce pretty 
much any type of graph with a couple of lines of code, the default layout and look of the graph 
is often very basic. Sometimes, you may run into problems such as axis labels and titles 
getting chopped off at the edges or the legend size or position may mask part of your graph. 
Sometimes, the default color combinations may not be suitable for presentation 
or publication. 

In this chapter we will go through the relevant names and accepted values of different 
arguments and arguments to graph functions. We will take a closer look at the pa r () 
function, which we briefly introduced in the previous chapter. 

Reading and trying out all the recipes in this chapter is highly recommended as it will give you 
a very good hands-on grasp of certain aspects of graph manipulation, which you are likely to 
use a lot in any visual analysis in R. 

Let's get started! 


Setting colors of points, lines, and bars 


In this recipe we will learn the simplest way to change the colors of points, lines, and bars in 
scatter plots, line plots, histograms, and bar plots. 


Getting ready 


All you need to try out this recipe is to run R and type the recipe at the command prompt. 
You can also choose to save the recipe as a script so that you can use it again later on. 



The simplest way to change the color of any graph element is by using the c o I argument. For 
example, the pi ot () function takes the c o I argument: 


plot(rnorm( 1000), 
col ="red") 







Chapter 2 



0 200 400 600 800 1000 

Index 


If we choose plot type as line, then the color is applied to the plotted line. Let's use the 
dai I y s a I es. cs v example dataset we used in Chapter 1. First, we need to load it: 

Sales <- read. csv("daiIysal es. csv", header=TRUE) 

pi o t (s a I es $uni t s 〜 a s _ Dat e( s al es $dat e, " %d / %m/ %y 11 ), 
t y pe= M I ", #Speci f y type of plot as I for line 
col ="bl ue") 
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Similarly, the poi nt s ( ) and I i nes () functions apply the c o I argument's value to the 
plotted points and lines respectively. 

ba r pi ot () and hi st ( ) also take the c o I argument and apply them to the bars. So the 
following code would produce a bar plot with blue bars: 

ba r piot(s a I es$ProductA~sal es $Cit y, 
col ="bl ue") 

The c o I argument for b o x p I o t ( ) is applied to the color of the boxes plotted. 



The c o I argument automatically applies the specified color to the elements being plotted, 
based on the plot type. So, if we do not specify a plot type or choose points, then the color 
is applied to points. Similarly, if we choose plot type as line then the color is applied to the 
plotted line and if we use the c o I argument in the ba r pi ot () or h i s t o g r a m( ) commands, 
then the color is applied to the bars. 

col accepts names of colors such as r e d, b I ue, and b I ack. The colors!) (or c o I o u r s ()) 
function lists all the built-in colors (more than 650) available in R. We can also specify colors 
as hexadecimal codes such as IFF 0 0 0 0 (for red), #0 0 0 OFF (for blue), and #0 0 0 0 0 0 (for 
black). If you have ever made any web pages a you would know that these hex codes are used 
in HTML to represent colors. 

col can also take numeric values. When it is set to a numeric value, the color corresponding 
to that index in the current color palette is used. For example, in the default color palette the 
first color is black and the second color is red. So c o I =1 and col =2 refers to black and red 
respectively. Index 0 corresponds to the background color. 


There's more... 


In many settings, c o I can also take a vector of multiple colors, instead of a single color. This 
is useful if you wish to use more than one color in a graph. For example, in Chapter 1 we 
made a bar plot of sales data for three products across five cities. In that example, we did use 
a vector of five colors to represent each of the five cities with the help of the heat .colors)) 
function. The heat, colors)) function takes a number as an argument and returns a vector 
of those many colors. Soheat.colors(5) produces a vector of five colors. 

Type the following at the R prompt: 

heat.col ors(5) 

You should get the following output: 

[1] 11 #FF 0 0 0 0 FF" 11 #FF 5 5 0 0 FF" 11 #FFAA00FF M 11 #F F F F 0 0 F F 11 11 #F F F F 8 0 F F 11 





Those are five colors in the hexadecimal format. 
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In the example, both the bars for the first and last data rows (Seattle and Mumbai) would be 
of the same color (red), making it difficult to distinguish one from the other. 

One good way to ensure that you always have the correct number of colors is to find out the 
length of the number of elements first and pass that as an argument to one of the color 
palette functions. For example, if we did not know the number of cities in the example we have 
just seen; we could do the following to make sure the number of colors matches the number 
of bars plotted: 

barplot(as.matrix(sales[,2:4]), beside =T, 

I egend=sal es$Ci t y, 

col =heat . c ol or s ( I engt h( sal es $Ci t y)), 
border ="whit e") 

We used the length!) function to find out the length or the number of elements in the 
vector sal es $Ci t y and passed that as the argument to h e a t. c o I o r s ( ) . So, regardless 
of the number of cities we will always have the right number of colors. 


See also 


In the next four recipes, we will see how to change the colors of other elements. The fourth 
recipe is especially useful where we look at color combinations and palettes. 
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Another way of specifying a vector of colors is to construct one: 

barplot(as.matrix(sales[ ( 2:4]), beside =T, 

I egend=sal es$Ci t y, 

col =c( "red", "bl ue", "green", "orange", "pi nk 11 ), 
border ="whi t e") 

In the example, we set the value of c ol to c ( 11 r e d", " bl ue", "green", "orange", 

"pi nk") , which is a vector of five colors. 

We have to take care to make a vector matching the length of the number of elements, in this 
case bars we are plotting. If the two numbers don't match, R will 'recycle' values by repeating 
colors from the beginning of the vector. For example, if we had fewer colors in the vector than 
the number of elements, say if we had four colors in the previous plot, then R would apply 
the four colors to the first four bars and then apply the first color to the fifth bar. This is called 
recycling in R: 
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The default background color of all plots in R is white, which is usually the best choice as it is 
least distracting for data analysis. However, sometimes we may wish to use another color. We 
will see how to set background colors in this recipe. 


All you need to try out this recipe is to run R and type the recipe at the command prompt. 
You can also choose to save the recipe as a script so that you can use it again later on. 


To set the plot background color to gray we use the b g argument in the pa r ( ) command: 

par(bg="gr ay") 
pi ot (r nor m( 100)) 


The pa r () command's bg argument sets the background color for the entire plotting area 
including the margins for any subsequent plots on the same device. Until the plotting device is 
closed or a new device is initiated, the background color stays the same. 




Getting ready 


Setting plot background colors 
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It is more likely that we want to set the background color only for the plot region (within the 
axes) but there is no straightforward way to do this in R. We must draw a rectangle of the 
desired color in the background and then make our graph on top of it: 

pi ot (r nor m( 1 0 0 0 ) , t ype = " n") 
x<- par (" us r 11 ) 

rect (x[ 1], x[ 3] , x[ 2], x[ 4] , col =" I i ghtgray 11 ) 
poi nts( r norm( 1 0 0 0 )) 

First we draw the plot with type set to "n" so that the plotted elements are invisible. This 
does not show the graph points or lines but sets the axes up, which we need for the next step. 

p a r ( 11 u s r") gets us the co-ordinates of the plot region in a vector of form c (x I eft, 
xr i ght, ybottom, y t o p) . We then use the r ect () function to draw a rectangle with a fill 
color that we wish to use for the plot background. Note that r ec t ( ) takes a set of arguments 
representing the xleft,ybottom, xright,ytop co-ordinates. So we must pass the values 
we obtained from p a r ( " u s r") in the correct order. Then, finally we redraw the graph with the 
correct type (points or lines). 


There's more... 
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Setting colors for text elements: axis 
annotations, labels, plot titles, and legends 


Axis annotations are the numerical or text values placed beside tick marks on an axis. 
Axis labels are the names or titles of axes, which tell the reader what the values on a 
particular axis represent. In this recipe, we will learn how to set the colors for these 
elements and legends. 


Getting ready 


All you need to try out this recipe is to run R and type the recipe at the command prompt. 
You can also choose to save the recipe as a script so that you can use it again later on. 


Let's say we want to make the axis value annotations black, the labels of the axes gray, and 
the plot title dark blue, you should do the following: 

plot(rnorm(100), 
mai n=" PI ot Title", 
col . axi s ="bl ue", 
col . I ab = "red", 
col . mai n=" dar kbl ue") 








Chapter 2 



Colors for axis annotations, labels, and plot titles can be set either using the par ( ) command 
before making the graph or in the graph command such as p I o t ( ) itself. The arguments for 
setting the colors for axis annotations, labels, and plot titles are c o I . a x i s, c o I . I a b, and 
c o I . ma i n respectively. 

They are similar to the c o I argument and take names of colors or hex codes as values, but do 
not take a vector of more than one color. 


There's more... 


If we use the pa r () command, the difference is that par ( ) will apply these settings to every 
subsequent graph, until it is reset either by specifying the settings again or starting a new 
graphics device: 

par (col . axi s=" bl ack", 
col.I ab="#444444", 
col . mai n=" dar kbl ue") 

pi ot (r nor m( 100), mai n=" pi ot 11 ) 

The c o I . a x i s argument can also be passed to the a x i s ( ) function, which is useful for 
making a custom axis if you do not want to use the default axis. The c o I . I a b argument does 

not work with axi s ( ) and must be specified in pa r ( ) or the main graph function such as 

pi ot () orbar pi ot (). 

The c o I . ma i n argument can also be passed to the t i 11 e ( ) function, which is useful for 
adding a custom plot title if you do not want to use the default title: 

title( 11 Sales Figures for 2010", c ol . mai n= M bl ue 11 ) 

Axis labels can also be specified with title)): 

t i 11 e( xl ab=" Mont h", yl ab = " Sal es", col . I ab=" red") 

This is handy because you can specify two different colors for the X and Y axes: 

ti tl e( xl ab="X axi s 11 , coI . I ab = "red") 

ti tl e( yl ab=" Y axi s", col . I ab = n bl ue") 

When setting the axis titles with the title)) command, we must set xl ab and y I a b to 
empty strings "" in the original plot command to avoid overlapping titles. 
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Choosing color combinations and palettes 


We often need more than one color to represent various elements in graphs. Palettes are 
combinations of colors which are a convenient way to use multiple colors without choosing 
individual colors separately. R provides inbuilt color palettes as well as the ability to make 
our own custom palettes. Using palettes is a good way to avoid repeatedly choosing or 
setting colors in multiple locations, which can be a source of error and confusion. It helps in 
separating the presentation settings of a graph from the construction. 


Getting ready 


All you need to try out this recipe is to run R and type the recipe at the command prompt. You 
can also choose to save the recipe as a script so that you can use it again later on. One new 
library needs to be installed, which is also explained. 


How to do it... 


We can change the current palette by passing a character vector of colors to the pal et t e() 
function. For example: 

pal ett e( c(" red", 11 bl ue", "green", "orange")) 

To use the colors in the current palette, we can refer to them by the index number. For 
example, palette()[l] would be r ed. 


How it works... 


R has a default palette of colors which can be accessed by calling the palette!) function. 
If we run the pal et t e( ) command just after starting R, we get the default palette: 

palette)) 

[1] "black" 11 red" "greenB 11 "blue 11 "cyan" "magenta" 
"yellow" 

[8] "gray" 

To revert back to the default palette type: 
pal ette( "def aul t 11 ) 

When a vector of color names is passed to the pal et t e ( ) function, it sets the current 
palette to those colors. We must enter valid color names otherwise we will get an invalid 
color name error. 
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There's more... 


Besides the default palette provided by the pal et t e ( ) function, R has many more built- 
in palettes and additional palette libraries. One of the most commonly used palettes is the 
heat, colors)) palette, which provides a range of colors from red through yellow to white, 
based on the number of colors specified by the argument n. For example, heat, col o r s ( 10) 
produces a palette of 10 warm colors from red to white. 

Other palettes are rainbow() ， terrain.colors() ， c m. colors()，and t opo. col or s 
which take the number of colors as an argument. 

RCol or Br ewer is a very good color palette package that creates nice looking color palettes 
especially for thematic maps. It is an R implementation of the RCol or Br ewer palettes, which 
provides three types of palettes: sequential, diverging, and qualitative. More information is 
available ath11 p: / / www. col or br ewer. or g. 

To use RCol or Br ewer ， we need to install and load it: 

i nstal I . packages! 11 RCol or Br ewer 11 ) 

I i br ar y( RCol or Brewer) 

To see all the RCol or Br ewer palettes run the following command at the R prompt: 
di s pi ay. brewer.a I I() 
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The names of the palettes are displayed in the left-hand margin and the colors in each palette 
are displayed in each row running to the right. 

To use one of the palettes, let's say YI Or Rd (which as the names suggests is a combination 
of yellows and reds), we can use the brewer. pal ( ) function: 

brewer, pal (7, 11 Yl Or Rd") 

[1] "#FFFFB2" 11 #FED9 7 6 M l, #FEB24C" 11 #FD8D3C" 11 #FC4E2A M 11 #E31A1C M 
l, #B 1 0 0 2 6 " 

The brewer, pal function takes two arguments: the number of colors we wish to choose and 
the name of the palette. The minimum number of colors is three but the maximum varies from 
palette to palette. 

We can view the colors of an individual palette by using the d i s pi ay. brewer, pal () 
command: 

display, brewer, pal (7, "Yl Or Rd") 

To use a specific color of the palette we can refer to it by its index number. So the first color in 
the palette is br ewer _ pal ( 7, 11 YI Or Rd 11 ) [ 1 ] ， the second isbr ewer. pal ( 7, 11 Yl Or Rd") 

[ 2 ] , and so on. 

We can set the current palette to the previous one by using the pal et t e ( ) function: 
pal ett e( brewer, pal (7, "Yl Or Rd 11 )) 

Now we can refer to the individual colors aspalette()[l],palette()[2], and so on. We 
can also store the palette as a vector: 

pal1<- brewer, pal( 1, "YlOrRd") 


See also 


We will see the use of a lot of color palettes throughout the recipes in this book starting from 
Chapter 3, Creating Scatter Plots. 


Setting fonts for annotations and titles 


For most data analysis we can just use the default fonts for titles. However, sometimes we 
may want to choose different fonts for presentation and publication purposes. Selecting fonts 
can be tricky as it depends on the operating system and the graphics device. We will see 
some simple ways to choose fonts in this recipe. 
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Getting ready 


All you need to try out this recipe is to run R and type the recipe at the command prompt. You 
can also choose to save the recipe as a script so that you can use it again later on. 


How to do it... 


The font family and face can be set with the par ( ) command: 
par (f ami I y=" ser i f", font =2) 


How it works... 


A font is specified in two parts: a font f a mi I y (such as Helvetica or Arial) and a font face 
within that f a mi I y (such as bold or italic). 

The available font families vary by operating system and graphics devices. So R provides some 
proxy values which are mapped on to the relevant available fonts irrespective of the system. 
Standard values for family are "serif", "sans", and "mono". 

The font argument takes numerical values: 1 corresponds to plain text (the default), 2 to bold 
face, 3 to italic, and 4 to bold italic. 

For example, p a r (f a mi I y =" ser i f 11 , f o nt =2) sets the font to a bold Times New Roman 
on Windows. You can check the other font mappings by running the wi n d o ws F o nt s () 
command at the R prompt. 

The fonts for axis annotations, labels, and plot main title can be set separately using the 
font.axis,font, lab, and font, ma i n arguments respectively. 


There's more... 


The choice of fonts is very limited if we just use the proxy family names. However, we can use 
a wide range of fonts if we are exporting our graphs in the PostScript or PDF formats. The 
post scr i pt F ont s () and pdf Fonts) ) functions show all the available fonts for those 
devices. To see the PDF fonts, run the following command: 

names(pdfFonts()) 


[1] 

"serif" 

"sans 11 

11 mo n o 11 

[4] 

"Avant Gar de" 

11 B o o k ma n 11 

"Couri er 11 

[ 7] 

11 Hel vet i ca" 

11 Hel vet i ca- Narrow" 

11 NewCent ur ySchool book" 

[10] 

"Pal ati no" 

11 Ti mes" 

11 URWGot hi c" 

[13] 

__ URWBookman" 

"Ni mb us Mon" 

11 Ni mbusSan" 












Beyond the Basics: Adjusting Key Parameters 


URWHel vet 丨 ca" 
URWPal I adi o 11 
J apanl" 

J apanl Ryu mi n" 
CNSl" 


Ni mb usSanCond 
Ni mb us Rom" 

J apanlHei Mi n" 
Koreal" 

GB1" 


To use one of these font families in a PDF, we can pass the f a mi I y argument to the 
pdf ( ) function: 

pdf (f ami I y =" Av a nt Ga r de 11 ) pdf (past e( f ami I y=" Avant Gar de") 


See also 


In Chapter 10, Finalizing Graphs, we will see some more practical recipes on setting fonts for 
publications and presentations. 


Choosing plotting point symbol styles 
and sizes 


In this recipe, we will see how we can adjust the styling of plotting symbols, which is useful 
and necessary when we plot more than one set of points representing different groups of 
data on the same graph. 


Getting ready 


All you need to try out this recipe is to run R and type the recipe at the command prompt. You 
can also choose to save the recipe as a script so that you can use it again later on. We will 
also use the cityrain.csv example data file that we used in the first chapter. Please read 
the file into R as follows: 

rain <-read.csv( 11 cityrain.csv") 


How to do it... 


The plotting symbol and size can be set using the pc h and c ex arguments: 
pi ot(rnorm(100), pch = 19,cex=2) 


h- h- n 

c = t hu 
s s o G 
y G G d 

ri 》 Tx _ ■ 

u_n 9 

t I — 3 6 

n w Di r 
G R 3 o 

c u ― - K 


rvi lo oo 
• — I < — I rvi fsl r-si 
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The p c h argument stands for plotting character (symbol). It can take numerical values (usually 
between 0 and 25) as well as single character values. Each numerical value represents a 
different symbol. For example, 1 represents circles, 2 represents triangles, 3 represents plus 
signs, and so on. If we set the value of pc h to a character such as 11 *" or" £ 11 in inverted 
commas, then the data points are drawn as that character instead of the default circles. 

The size of the plotting symbol is controlled by the c ex argument, which takes numerical 
values starting at 0 giving the amount by which plotting symbols should be magnified relative 
to the default. Note that c ex takes relative values (the default is 1). So, the absolute size may 
vary depending on the defaults of the graphic device in use. For example, the size of plotting 
symbols with the same c ex value may be different for a graph saved as a PNG file versus a 
graph saved as a PDF. 


There's more... 


The most common use of pc h and cex is when we don't want to use color to distinguish 
between different groups of data points. This is often the case in scientific journals which do 
not accept color images. For example, let's plot the city rainfall data we looked at in Chapter 1 
as a set of points instead of lines: 

pi ot(r ai n$Tokyo, 
yl i m=c( 0, 2 5 0 ), 

mai n=" Mont hi y Rainfall in major cities", 
xl ab=" Month of Year", 
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I e g e n a ( 11 1 o p 11 , 

I egend=c( "Tokyo", 11 New Yor k", " London", " Ber I i n"), 
nc ol =4, 
cex=0. 8, 
bty= ,l n 11 , 

I t y=l: 4, 

I wd=2) 


I I nes(r a I n $ NewYo r k,I t y =2,I wd =2) 
I i nes(r ai n$London,Ity=3,I wd=2) 
lines(rain$Berlin,lty=4,lwd=2) 


pi ot (r ai n$Tokyo, 
yl i m=c ( 0, 2 5 0 ), 

mai n=" Mont hi y Rainfall in major cities" 
xl ab=" Month of Year 11 , 
yl ab=" Rai nf all ( mm) , 
type=" I 11 , 

I t y = l ( 

I wd=2) 


Line styles can be set by using the I t y and I wd arguments (for line type and width 
respectively) in the pi ot(),lines)), and pa r ( ) commands. Let's take our rainfall 
example and apply different line styles keeping the color the same: 


All you need to try out this recipe is to run R and type the recipe at the command prompt. You 
can also choose to save the recipe as a script so that you can use it again later on. We will 
again use the ci t y r a i n. c s v data file that we read in the last recipe. 


Getting ready 


How to do it... 



Month of Year 


10 









Beyond the Basics: Adjusting Key Parameters 



Both line type and width can be set with numerical values as shown in the previous example. 
Line type number values correspond to types of lines: 

► 0: bl ank 

► 1: s o I i d (default) 

► 2: dashed 

► 3: dot ted 

► 4: dot dash 

► 5:1 ongdas h 

► 6:t wodash 

We can also use the character strings instead of numbers, for example, I t y =" das hed" 
instead of I t y =2. 

The line width argument I wd takes positive numerical values. The default value is 1. In the 
example we used a value of 2, thus making the lines thicker than default. 


See also 


We will explore more examples of line styles in subsequent chapters, especially Chapter 4, 
Creating Line Graphs and Time Series Charts in which we will see some advanced line 
graph recipes. 


Choosing box styles 


The styles of various boxes drawn in a graph such as the one around the plotting region and 
the legend can be adjusted in a similar way to the line styles we saw in the last recipe. 


Getting ready 


All you need to try out this recipe is to run R and type the recipe at the command prompt. You 
can also choose to save the recipe as a script so that you can use it again later on. 









Let's say we want to make an L-shaped box around a graph, such that the default top and right 
borders are not drawn. We can do so using the bt y argument in the pa r ( ) command: 

par( bty= M l 11 ) 
pi ot (r nor m( 100)) 


The b t y argument stands for box type and takes single characters in inverted commas as 
values. The resulting box resembles the corresponding upper case letter. For example, the 
default value is o ， thus giving a box with all four edges. Other possible values are I ， 7 ， c ， u ， 
and ]■ If we do not wish to draw a box at all，we can set bt y to n. 


♦ 


Note that setting bt y to n doesn't suppress the drawing of axes. If we wish 
to suppress those too then we would also have to set x a x t and y a xt to 
n. Alternatively, we can simply set the axes argument to F A L S E in the 
pi ot ( ) function call. 
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There's more... 


Box styles can be controlled in a finer way using the box ( ) command. In addition to the I t y 
and I wd arguments, we can also specify where the box should be drawn using the which 
parameter, which can take values of pi ot, f i gur e, i nner, and outer. 

Let's say we want to draw a graph with an L-shaped box for the plot area and a full box around 
the figure including the axis annotations and titles, then we can do: 


Note that we had to first set the outer margins by setting the o ma argument with the par () 
function. We will learn more about this argument later in this chapter. If we did not set the 
outer margins, the box around the figure would be right at the edge of the plot and get cut off 
because the default margins are set to zero. 
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Adjusting axis annotations and tick marks 


The default axis settings are often not adequate to deal with all kinds of data. For example, 
we may wish to change the number of tick marks along an axis or change the orientation of 
the annotations if they are too long to fit horizontally. In this recipe we will cover some settings 
which can be used to customize axes as per our requirements. 


Getting ready 


All you need to try out this recipe is to run R and type the recipe at the command prompt. 
You can also choose to save the recipe as a script so that you can use it again later on. 


We can set the xaxp and y axp arguments with the par ( ) command to specify co-ordinates 
of the extreme tick marks and the number of intervals between tick marks in the form 
c ( mi n, ma x, n). 

pi ot(rnorm(100), xaxp=c (0, 100,10)) 



(OOOEJOE 


Index 
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When x a x p or y a x p is not specified, R automatically calculates the number of tick marks and 
their values. By default, R extends the axis limits by adding 4% at each end and then draws an 
axis which fits within the extended range. This means that even if we set the axis limits using 
xl i m or y I i m, the graph corners don't exactly correspond with those values. To make sure 
they do, we need to change the axis style using the xaxs argument, which takes one of two 
possible values: r (regular or default) and i (internal). We need to set x a x s to i . 

A vector of the form c ( x 1, x 2, n) giving the co-ordinates of the extreme tick marks and the 
number of intervals between tick marks 


There's more... 


To change the orientation of axis value annotations, we need to set the I as argument of the 
pa r ( ) command. It takes one of four possible numeric values: 

► 0: always parallel to the axis (default) 

► 1: always horizontal 

► 2: always perpendicular to the axis 

► 3: always vertical 

We can also use the ax i s () command to make a custom axis by specifying a number of 
arguments. The basic arguments are: 

► side which takes numeric values (l=below, 2=left, 3=above and 4=right) 

► at which takes a vector of co-ordinates where tick marks are to be drawn 

► labels which takes a vector of tick mark annotations 


We can separately set the line width for the axis lines and the tick marks by passing the I wd 
and I wd. t i c ks arguments respectively. Similarly colors can be set using the c o I and col. 
ticks arguments. 


See also 


We will come across various examples of custom axes in the following chapters as we explore 
more advanced recipes. 






Formatting log axes 
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There's more... 


We can also set scales to be logarithmic by setting the x I o g and y I og arguments to TRUE 
with the par ( ) command. This can be handy if we wish to have the same setting for multiple 
plots as p a r ( ) applies the settings to all subsequent plots on the same device. 

Note that R will not create the plot if our data contains zero or negative values. 


Setting graph margins and dimensions 


In this recipe we will learn how to adjust graph margins and dimensions. 


Getting ready 


All you need to try out this recipe is to run R and type the recipe at the command prompt. 
You can also choose to save the recipe as a script so that you can use it again later on. 


How to do it... 


We can use the f i n and pi n arguments of the par ( ) command to set the figure region and 
plot dimensions: 

par(fi n=c(6, 6), 
pi n=c( 4, 4)) 

We can use the ma i and omi arguments to adjust the inner and outer margins respectively: 

par ( mai =c( 1, 1 ( 1, 1), 
omi =c( 0. 1, 0. 1, 0. 1 ( 0. 1)) 


How it works... 


All the previous arguments accept values in inches as a pair of width and height values. The 
default values for f i n and pi n are approximately 7x7 and 5.75x5.15. We have to be careful 
not to specify bigger values for p i n than f i n or we would get an error. 

Adjusting f i n and pi n is one way of setting the figure margins containing the axis 
annotations and labels. Another way is to use the ma i or ma r arguments. In the example, 
we used ma i which takes a vector value in inches, whereas ma r takes a vector of numerical 
values in terms of number of lines of margins. It is better to use ma r or ma i because they 
adjust the figure margins irrespective of the figure or plot size. 
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We can also set an outer margin which is set to zero by default. This margin is useful if we 
wish to contain the entire graph including axis labels within a box as we saw in an earlier 
recipe. Like figure margins, outer margins can be set in inches with o mi or in number of lines 
of text using oma. 

R Graphics by Paul Murrell is an excellent reference with visual explanations of how margins 
work in R. See the book homepage for more details: http:// www. st at. auckl and. 
a c. nz/ 〜 paul / RGr aphi cs/rgraphi c s. ht ml . 


This talk by Paul Murrell also contains figures from the book explaining the same concepts: 
http:// www. st at. auckl and. a c. nz / 〜 paul/Tal ks/Rgraphi c s. pdf. 


See also 


We will come across examples of figure margin settings in some of the recipes in the 
following chapters. 






Creating Scatter Plots 


In this chapter, we will cover: 

► Grouping data points within a scatter plot 

► Highlighting grouped data points by size and symbol type 

► Labelling data points 

► Correlation matrix using pairs plot 

► Adding error bars 

► Using jitter to distinguish closely packed data points 

► Adding linear model lines 

► Adding non-linear model curves 

► Adding non-parametric model curves with lowess 

► Making three-dimensional scatter plots 

► Making Quantile-Quantile plots 

► Displaying data density on axes 

► Making scatter plots with smoothed density representation 


Introduction 


In this chapter, we will learn about scatter plots in depth by looking at some advanced recipes. 
Scatter plots are one of the most commonly used type of graphs in data analysis. In the first 
chapter we learnt how to make a basic scatter plot. Now we will see how we can make more 
enhanced plots by adjusting various arguments and using some new functions. 




Creating Scatter Plots 


So far, we have mostly only used the base graphics functions such aspl ot ( ) , but in this 
chapter we have recipes that use other graph libraries such as I at t i c e and ggpl ot 2, which 
offer more advanced control over graphs. It is possible to make these advanced graphs using 
the base library too, but the additional libraries give us ways to achieve the same results with 
less code and often produce better looking graphs with the least amount of effort. 

A lot of new functions will be introduced in this chapter. It is good practice to look up the 
help file whenever you encounter a new function. For example, to look up the help file for the 
pi ot () function, you can type ? pi ot or hel p( pi ot) at the R command prompt. 

As the recipes in this chapter are slightly more advanced than the earlier chapters, it may take 
some practice with multiple datasets before you are comfortable with using all the functions. 
Example datasets are used in each recipe, but it is highly recommended to also work with 
your own datasets and modify the recipes to suit your own analysis. 


Grouping data points within a scatter plot 


A basic scatter plot has a set of points plotted at the intersection of their values along X and 
Y axes. Sometimes, we may wish to further distinguish between these points based on 
another value associated with the points. In this recipe we will see how we can group data 
points using color. 


Getting ready 


To try out this recipe, start R and type the recipe at the command prompt. You can also 
choose to save the recipe as a script so that you can use it again later on. 

We will also need the lattice and ggpl ot 2 packages. The I at t i ce package is included 
automatically in the base R installation, but we will need to install the ggpl ot 2 package. 

To do this, run the following command at the R prompt: 

i nst al I . packages! 11 ggpl o12 11 ) 



As a first example, let's use the xy pi ot ( ) command of the lattice library: 
Ii brary(Iattice) 

xy pi ot ( mpg 〜 di s p, 
dat a =mt c ar s, 
groups =cyl, 

auto, key =1 i st(cor ner =c(1,1))) 
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mt c a r s [ 1 : 6, 1 : 3 ] 

Mazda RX4 
Mazda RX4 Wag 
Dat s un 710 
Hornet 4 Drive 
Hornet Sport about 
Vali ant 



In the example, we used the xy pi ot () command to plot mpg versus dispfrom the 
pre-loaded mt c a r s dataset. We will understand this better if we look at the actual dataset. 
Type mt c a r s at the R prompt and hit Enter. Let's look at a sample of the data to see the 
row names and first three columns of data: 


s cu ro Au s- _b r'si 

_i^ - .__ ._I f-si on 


6 6 4 6 8 ro 


ny o o 8 4 7— i — _ 

m._ - ._12 I — 18 8 

fsl Ovi fsl i^__ 
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So we plotted mp g against di s p ， but we also used the groups argument to group the data 
points bycyl . That tells xy pi ot ( ) that we would like to highlight the data points by different 
colors based on the number of cylinders (c y I ) each car has. Finally, the auto, key argument 
is set to add a legend so that we know what values of c y I each color represents. The 
auto, key argument can take a list of values. The only one we have provided here is the 
location given by the corner argument, which we set to c ( 1, 1) representing the top right 
corner. We can also simply set auto, key to TRUE, which will draw the legend in the top 
margin outside the plotting area. 


There's more... 


The xy pi ot ( ) function has slightly obscure arguments. If you look at the help file on 
xy p I ot ( ) (by running ? xy pi ot), you will see that there are a lot of arguments which can 
be used to control many different aspects of the graph. A simpler alternative to x y p I o t ( ) is 
using the functions from the g g pi ot 2 package. Let's draw the same plot using ggpl ot 2: 

Ii br ar y(ggpl ot 2) 

qpl ot (di s p, mpg,data =mt cars,col = as. f act or (cyl )) 
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First we load the ggpl ot 2 library and then use the qpl ot ( ) function to make the previous 
graph. We passed d i s p and mp g as the x and y variables respectively (note we can't use the 
y 〜 x notation in qpl ot). To group by cy I ， all we had to do was set the c o I argument to c y I . 
This tells qpl ot that we want to group the points based on the values of cyl and represent 
them by different colors. The legend is automatically drawn to the right. 

Note that we set c o I to a s. f a c t o r (c y I ) and not just cyl . This is to make sure that c y I 
is read as a factor (or categorical value). If we just use cyl , then the plot is still the same, 
but the color scale and legend uses all the values between 4 and 8 as it takes cyl as a 
numerical variable. 

Thus, it is easier and more intuitive to produce a better looking graph with ggpl ot 2. 


See also 


We will use ggpl ot 2 to group data points by size and symbol instead of color 
in the next recipe. 


Highlighting grouped data points by size 
and symbol type 


Sometimes we may not want to use different colors to represent different groups of data 
points. For example, some journals accept graphs only in grayscale. In this recipe, we will 
see how we can highlight grouped data points by symbol size and type. 


Getting ready 


We will use the ggpl ot 2 library, so let's load it by running the following command: 
Ii br ar y(ggpl ot 2) 
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How to do it... 


First, let's group points by symbol type. Once again we use the qpl ot ( ) function: 
qplot(disp,mpg,data =mt cars,shape =as.factor(cyl)) 



Next, let's group the points simply by the size of the plotting symbol: 
qpl ot (di s p, mpg ( d at a =mt c a r s, s i ze=as. f act or (cyl )) 
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Highlighting groups of points by symbol type and size works exactly like color using the 
qpl ot ( ) functions. Instead of the c o I argument, we used the shape and size arguments 
and set them to the factor we want to group the points by (in this case c y I )_ We can also use 
combinations of any of these arguments. For example, we could use color to represent c y I 
and size to represent gear. 


Labelling data points 


In this recipe, we will learn how to label individual or multiple data points with text. 


Getting ready 


For this recipe, we don't need to load any additional libraries. We just need to type the recipe 
at the R prompt or run it as a script. 














There's more... 



We can also use the t ext () function to label all the data points in a graph, instead of 
just one or two. Let's look at another example where we wish to plot the life expectancy 
in countries versus their health expenditure. Instead of representing the data as points, 
let's use the name of countries to represent the values. We will use the example dataset 
Heal t hExpendi t ur e. csv: 

heal t h <- read, c s v ("Heal t hExpendi t ur e. c s v 11 , header =T RUE) 

pi ot (heal t h$Expendi t ur e, heal t h$Li f e_Expect ancy, type: 11 n 11 ) 

t ext(heaIt h$Expendit ur e, heal t h$Lif e_Expect ancy, heal t h$Count ry) 


We first use plot)) command to make a graph of life expectancy versus expenditure. Note 
that we set type equal to " n 11 , which means that only the graph layout and axes are drawn 
but no data points are drawn. Then we use the t ext ( ) function to place country names as 
labels at the x-y locations of all the data points. Thus, t ext ( ) accepts vectors as values for 
(x ， y) and labels to dynamically label all the data points with the corresponding country names. 
In case the text labels overlap, we can use the jitter)) function or remove some labels to 
reduce the overlap. 
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Correlation matrix using pairs plot 


In this recipe, we will learn how to create a correlation matrix, which is a handy way of quickly 
finding out which variables in a dataset are correlated with each other. 


Getting ready 


To try out this recipe, simply type it at the command prompt. You can also choose to save the 
recipe as a script so that you can use it again later on. 



We will use the iris flowers dataset that we first used in the pairs plot recipe in Chapter 1: 


panel . cor <- f unct i on( x, y , …） 

{ 

pa r(us r = c(0, 1, 0, 1)) 

txt <- as ■ c ha r acte r ( for mat( cor (x, y), di gi t s=2) ) 
t ext (0. 5, 0.5, txt, c ex = 6* a bs (c or (x, y))) 


pai r s( i ris[1:4], upper. panel =panel. cor) 
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We have basically used the p a i r s () function to make the graph, but in addition to the 
dataset we also set the upper. panel argument to p a n e I . cor, which is a function we 
define beforehand. The upper. panel argument refers to the squares in the top-right half of 
the previous graph the diagonal going from the top-left to the bottom-right. Correspondingly, 
there is also a I o we r. panel argument for the bottom-left half of the graph. 

The panel . c or value is defined as a function using the following notation: 

newf unction <-function(argl, arg2 , …） 

{ 

#f unction code here 

} 

The panel, c or function does a few different things. First it sets the individual panel block 
axes limits to c ( 0, 1, 0, 1) using the pa r ( ) command. Then it calculates the correlation 
co-efficient value between a pair of variables up to two decimal values and formats it as a text 
string so that it can then be passed to the t ext ( ) function which places it in the center of 
each block. Also note that the size of the labels is set using the c e x argument to a multiple 
of the absolute value of the correlation co-efficient. Thus the size of the value label also 
indicates how important the correlation is. 

Panel functions are in fact one of the most powerful features of the lattice package. To learn 
more about them and the package, please refer to the excellent book "Lattice: Multivariate 
Data Visualization with R" by Deepayan Sarkar, who is also the author of the package. The 
book website is at: 

ht t p: / /1 md vr. r - f or ge. r - pr oj ect. org/fi gures/fi gur es. ht ml 


Adding error bars 


In most scientific data visualization, error bars are necessary to show the level of confidence 
in the data. However, there is no pre-defined function in the base R library for drawing error 
bars. In this recipe we will learn how to draw error bars in scatterplots. 


Getting ready 


All you need for the next recipe is to type it at the R prompt as we will use some base library 
functions to define a new error bar function. You may also save the recipe code as a script so 
that you can use it again later on. 
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How to do it... 
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xl=mtcar s$di sp*l. 05, 
yl=mtcar s$mpg ( 
angl e=90, 
code=3, 

I engt h=0. 04, 

I wd=0. 4) 




In the previous two examples we used the a r r o ws () function to draw horizontal and vertical 
error bars, a r r o ws () is a base graphics function for drawing different kinds of arrows. It 
provides various arguments to adjust the size, location, and shape of the arrows such that 
they can be used as error bars. 

The first four arguments define the location of the start and end points of the arrows. The first 
two arguments x 0 and y 0 are co-ordinates of the starting points and the next two arguments 
x 1 and y 1 are co-ordinates of the end points of the arrows. 

For drawing vertical error bars, say with a 5% error both ways, we set both x 0 and x 1 to 
the x location of the data points (in this case mt c a r s $ d i s p) and we set y 0 and y 1 to 
the y values of the data points plus and minus the error margin (1. 0 5 * mt c a r s $ mp g and 
0. 9 5 * mt c a r s $ mp g respectively). 




Creating Scatter Plots 


Similarly, for drawing horizontal error bars we have the same y co-ordinate for the start and 
end, but add and subtract the error margin from the x co-ordinates of the data points. 

The angle argument is for setting the angle between the shaft of the arrow and the edge of 
the arrowhead. The default value is 3 0 (which looks more like an arrow), but to use as an error 
bar we set it to 9 0 (to flatten out the arrowhead in a way). 

The code argument sets the type of arrow to be drawn. Setting it to 3 means drawing an 
arrowhead at both ends. 

The I engt h and I wd arguments set the length of the arrowheads and the line width of the 
arrow respectively. 


There's more... 


The Hmi s c package has the e r r b a r function, which can be used to draw vertical error bars. 
The p I ot r i x package has the pi ot Cl function which can be used to draw error bars or 
confidence intervals. If we do not wish to write our own error bars function using a r r o ws (), 
it's easier to use one of these packages. 


Using jitter to distinguish closely packed 
data points 


Sometimes when working with large datasets, we may find that a lot of data points on a 
scatter plot overlap each other. In this recipe we will learn how to distinguish between closely 
packed data points by adding a small amount of noise with the j i 11 e r ( ) function. 


Getting ready 


All you need for the next recipe is to type it at the R prompt as we will use some base library 
functions to define a new error bar function. You may also save the recipe code as a script so 
that you can use it again later on. 



First let's make a graph which has a lot of overlapping points: 


x <- rbinom( 1000, 10, 0.2 5 ) 
y <- rbinom( 1000, 10, 0.2 5 ) 
pl ot (x, y) 







pi ot (j i 11 er(x) 
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In this recipe we will learn how to fit a linear model and plot the linear regression line on a 
scatter plot. 


All you need for the next recipe is to type it at the R prompt as we will only use some base 
functions. You may also save the recipe code as a script so that you can use it again later on 


Once again, let's use the mt c a r s dataset and draw a linear fit line for mp g versus d i s p: 

pi ot ( mt c ar s $mpg-mt c a r s $d i s p) 

I mf i t <-1 m( mt c a r s $mpg c ar s $di s p) 
a b I i n e (I mf i t) 


In the first graph, we plotted a 1,000 random data points generated with the r b i n o m() 
function. However, as you can see in the first graph, only a few data points are visible because 
there are multiple data points in the exact same location. Then when we plotted the points 
by applying the j i 11 er ( ) function to the x and y values we can see a lot more of the 1,000 
points. We can also see that most of the data is in the range of x and y values of 2 to 4. 
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We first draw the basic scatter plot of mp g versus d i s p. Then we fit a linear model to the data 
using the I m( ) function, which takes a formula in the form y ~x as its argument. Finally, we 
pass the linear fit to the a bl i ne( ) function, which reads the intercept and slope saved in the 
I mf i t object to draw a line. 


Adding non-linear model curves 


In this recipe, we will see how to fit and draw a non-linear model curve to a dataset. 


Getting ready 


All you need for the next recipe is to type it at the R prompt as we will only use some base 
functions. You may also save the recipe code as a script so that you can use it again later on. 


do it... 


Firstly plot an exponential plot: 


x <- ■ ( 1: 100)/10 

y <- 100 + 10 * exp(x / 2) + r n o r m( x) /10 

nl mod <- n I s (y - Const + A * exp(B * x), t r ace=TRUE) 


pi ot (x, y) 

Ii nes(x, predi ct(nl mod), col ="r ed") 
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We first plot y against x, where x is a variable defined using the sequence operator : and y 
is an exponential function of x . Then we fit a non-linear model to the data using the nl s () 
function. We save the model fit as n I mod and finally draw the model predicted values by 
passing x and predict(nlmod) to the I i nes ( ) function. 


Adding non-parametric model curves 
with lowess 


In this recipe, we will learn how to use lowess, a non-parametric model, and add the resulting 
prediction curve to a scatter plot. 


Getting ready 


For this recipe, we don't need to load any additional libraries. We just need to type the recipe 
at the R prompt or run it as a script. 


do it... 


First, let's make a simple scatter plot with the pre-loaded cars dataset and add a couple of 
lowess lines to it: 


pi ot (cars, mai n = "I owess( cars) 11 ) 

I i nes( I owess(cars), col = "blue") 

I i nes(I owess(cars, f =0. 3), col = "orange") 



10 15 20 25 


speed 
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Standard R sessions include the I owes s ( ) function. It is a smoother which uses locally 
weighted polynomial regression. The first argument, in this instance, is a data frame called 
cars giving the x and y variables (speed and dist). So we apply the I owes s function to the 
dataset cars and in turn pass that result to the I i nes ( ) function. The result of I owes s 
is a list with components named x and y. The I i nes ( ) function automatically detects that 
and uses the appropriate values to draw a smooth line through the scatter plot. The second 
smooth line has an additional argument f , which is known as the smoother span. This gives 
the proportion of points in the plot which influence the smoothening at each value. Larger 
values give more smoothness. The default value is approximately 0.67, so when we changed 
it to 0.3 we get a less smooth fit. 


Making three-dimensional scatter plots 


In this recipe we will learn how to make three-dimensional scatter plots which can be very 
useful when we want to explore the relationships between more than two variables at a time. 


Getting ready 


We need to install and load the scat t er pi ot 3d package in order to run this recipe: 

i nstal I • packages)" scatterpl o 13 d 11 ) 

I i br ar y( sc at t er pi ot 3d) 


Let's make the simplest default 3D-scatter plot with our mt c a r s dataset: 
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scatterpl ot 3d( mt c a r s $ wt, mt c a r s $ d i 
pch = 16, hi ghl i ght. 3d=TRUE, angle: 
xl ab=" Wei ght 11 , yl ab=" Di spl acement" 
type=" h", 

mai n=" Rel at i onshi ps between car s 



That was easy! The scatterpl ot 3d( ) functions much like the basic pi ot ( ) function. In 
the previous example all we had to provide were wt, d i s p, and mp g from the mt c a r s dataset 
as the x ， y ， and z arguments respectively. 


There's more... 


Just like pi ot ( ) and other graph functions, scatterpl ot 3d ( ) accepts a number of 
additional arguments using which we can configure the graph in many ways. Let's try some of 
these additional settings. 

Let's add a title to the graph, change the plotting symbol and the angle of viewing, add 
highlighting, and add vertical drop lines to the x-y plane: 


As you can see, we changed some of the graph settings using arguments we have already 
used before in the pi ot ( ) function. These include the axis titles, graph title, and symbol 
type. In addition, we added some color highlighting by setting the hi ghl i ght. 3d argument 
to T R U E, which draws the points in different colors related to the y co-ordinates (di s p )■ The 
angle argument is used to set the angle between the x and y axes, which controls the point 
from which we view the data. Finally, setting type to h adds the vertical lines to the x-y plane, 
which makes reading the graph easier. 


Relationships between car specifications 
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For more advanced three-dimensional data visualization in R, please have a look at the 
r ggobi package, which allows interactive analysis with 3D plots. The package can be 
installed like any other R package: 

i nstal I • packages!" rggobi 11 ) 

Please see the package website for more details at h 11 p: / / www.ggobi.org/rggobi/. 


How to make Quantile-Quantile plots 


In this recipe, we will see how to make Quantile-Quantile (Q-Q) plots, which are useful for 
comparing two probability distributions. 


Getting ready 


For this recipe, we don't need to load any additional libraries. We just need to type 
the recipe at the R prompt or run it as a script. 


How to do it... 


Let's see how the distribution of mpg in the mt c a r s dataset compares with a normal 
distribution using the q n o r m( ) function: 

q q n o r m( mt c a r s $ mp g) 
qql i n e ( mt c a r s $ mp g) 
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Normal Q-Q 



Theoretical Quantiles 


Residuals vs Leverage 



Residuals vs Fitted 



Fitted values 


Scale-Location 



Fitted values 



In the example, we used the qqnor m() function to create a normal Q-Q plot of mp g values. 
We added a straight line with the qql i ne( ) function. The closer the dots to this line the 
closer the distribution to a normal one. 


There's more... 


Another way of making a Q-Q plot is by calling the pi ot ( ) function on a model fit. For 
example, let's plot the following linear model fit: 

I mf i t <-1 m( mt ca r s $mpg c ar s $di s p) 
par ( mf r ow=c( 2,2)) 
pI o t (I mf i t) 


qEnpISolJ polZJPJPPUPfs 


义 pnp-solJ pol21PJFPU»n_s 


8 9 t?cvlo q 9 — 
slpnplse o: 


90 



SL OL SO 0.0 

pi p nTOlsaf j pdN'sp pcp 》 col 、 
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The second plot is a Q-Q plot comparing the model fit to a normal distribution. 
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As can be seen from the example, the r ug( ) function adds a set of lines just above the X 
axis. A line or tick mark is placed wherever there is a point at that particular X location. So, the 
more closely packed together the lines are, the higher the data density around those X values 
is. The example is obvious as we know that in a normal distribution most values are around 
the mean value (in this case zero). 

The r ug () function in its simplest form only takes one numeric vector as its argument. Note 
that it draws on top of an existing plot. 


There's more... 


Let's take another example and explore some of the additional arguments that can be passed 
to r ug ( ) . We will use the example met a I s. c s v dataset: 

met a I s <- read, c s v (" me t a I s. c s v 11 ) 
plot(Ba 〜 Cu,data=metals,xlim=c (0,100)) 
r ug( met a I s $Cu) 

r ug( met a I s $Ba, s i de =2, c oI =" r ed", t i cksi ze=0. 0 2) 
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We first read the me t a I s. c s v file and plot barium (Ba) concentrations against copper (Cu) 
concentrations. Next, we added a r ug of Cu values on the X axis using the default settings. 
Then we added another rug for Ba values on the Y axis by setting the side argument to 2. 

The side argument takes four values: 

► 1: bottom axis (default) 

► 2: left 

► 3: top 

► 4: right 

We also set the color of the tick marks to red using the c o I argument. Finally, we adjusted 
the size of the tick marks using the t i c ks i z e argument which reads numeric values as a 
fraction of the width of the plotting area. Positive values draw inward ticks and negative values 
draw ticks on the outside. 


Making scatter plots with smoothed density 
representation 


Smoothed density scatter plots are a good way of visualizing large datasets. In this recipe, we 
will learn how to make them using the smoothScatter() function. 


Getting ready 


For this recipe, we don't need to load any additional libraries. We just need to type the recipe 
at the R prompt or run it as a script. 






Creating Scatter Plots 



We will use the smoothScatter() function which is part of the base graphics library. 
We will use an example from the help file which can be accessed from the R prompt with 
the help command: 

n <- 1 0 0 0 0 

x <- mat r i x( r nor m( n), ncol =2) 

y <- mat r i x( r nor m( n, me an =3, sd=l. 5) , ncol =2) 

smoothScatter(x,y) 




The smoothScatter() function produces a smoothed color density representation of the 
scatter plot, obtained through a kernel density estimate. We passed the x and y variables 
which represented the data to be plotted. The gradient of the blue color shows the density of 
the data points, with most points in the center of the graph. The dots in the outer light blue 
circles are outliers. 


There's more... 


We can pass a number of arguments to s mo o t h S c a 11 e r ( ) to adjust the smoothing, 
for example nbi n for specifying the number of equally spaced grid points for the density 
estimation, and nr poi nt s to specify how many points to show as dots. In addition, we can 
also pass standard arguments such as xl a b, y I ab,pch,cex, and so on to modify axis and 
plotting symbol characteristics. 

-T94l - 
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Creating Line Graphs 
and Time Series 

Charts 


In this chapter, we will cover: 

► Adding customized legends for multiple line graphs 

► Using margin labels instead of legends for multiple line graphs 

► Adding horizontal and vertical grid lines 

► Adding marker lines at specific X and Y values 

► Creating sparklines 

► Plotting functions of a variable in a dataset 

► Formatting time series data for plotting 

► Plotting date and time on the X axis 

► Annotating axis labels in different human readable time formats 

► Adding vertical markers to indicate specific time events 

► Plotting data with varying time averaging periods 

► Creating stock charts 


Creating Line Graphs and Time Series Charts 


Introduction 


In Chapter 1, Basic Graph Functions and Chapter 2, Beyond the Basics: Adjusting Key 
Parameters, we learnt some basics of how to make line graphs and customize them by setting 
certain arguments as per our needs. In this chapter, we will learn some more intermediate 
to advanced recipes for customizing line graphs even further. We will look at ways to improve 
and speed up line graphs with multiple lines representing more than one variable. 

One of the most used form of line graphs is time trends or time series, where the X variable is 
some measure of time such as year, month, week, day, hour, and so on. Reading, formatting, 
and plotting dates can be quite tricky in R. In this chapter, we will see how to deal with dates 
and process them to make time series charts with custom annotations, grid lines, uncertainty 
bounds, and markers. 

We will also learn to make some interesting and popular types of time series charts such as 
sparklines and stock charts. 


As the recipes in this chapter are slightly more advanced than the earlier chapters, it may take 
some practice with multiple datasets before you are comfortable with using all the functions. 
Example datasets are used in each recipe, but it is highly recommended to also work with 
your own datasets and modify the recipes to suit your own analysis. 


Adding customized legends for multiple 
line graphs 


Line graphs with more than one line, representing more than one variable, are quite common 
in any kind of data analysis. In this recipe we will learn how to create and customize legends 
for such graphs. 


Getting ready 


We will use the base graphics library for this recipe, so all you need to do is run the recipe at 
the R prompt. It is good practice to save your code as a script to use again later. 



Once again we will use the c i t yr ai n. c s v example dataset that we used in Chapter 1 and 
Chapter 2. 


rain <-read.csv(" cityrain.csv") 
pi ot (r ai n$Tokyo, type: 11 b 11 , I wd =2 ( 
xaxt = n n", yl i m=c( 0, 3 0 0 ),col = u bl ack" 
xl ab=" Mont h", yl ab = " Rai nfall (mm)", 


—[~96 
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Monthly Rainfall in major cities 




We used the I e g e nd ( ) function, which we have already come across in earlier chapters. It 
is quite a flexible function and allows us to adjust the placement and styling of the legend in 
many ways. 

The first argument we passed to I egendf) specifies the position of the legend within the 
plot region. We used " t opr i ght 11 ; other possible values are " bot t omr i ght 11 ," bo11 o m", 
" b o 11 o ml e f t 11 , 11 1 e f t " , 11 1 o p I e f t 11 , 11 1 o p 11 , 11 r i g h t 11 , and " c e n t e r " . We can also 
specify the location of legend with x and y co-ordinates as we will soon see. 


{ 97 }- 


mai n=" Mont hi y Rainfall in major cities") 

axi s (1, at =1: I engt h( r ai n$Mo nth),I a be I s =r ai n$Mont h) 

I i nes( rai n$Berl i n, col ="red", type= M b",Iwd=2) 

I i nes( rai n$NewYork, col =" or ange, t ype=" b", I wd =2) 

I i nes( rai n$London, col = 11 pur pi e", t ype =" b", I wd =2) 

I egend("topri ght", I egend=c( n Tokyo M ( 11 Berl i n 11 , "New York", 11 London 11 ), 
I t y = l ( I wd=2 ( pch = 21, col =c( 11 bl ack 11 , 11 r ed", "orange", 11 pur pi e 11 ), 
ncol =2, bty=" n", cex=0. 8, 

text. col =c( 11 bl a c k 11 , 11 red 11 , "orange 11 , 11 purple"), 
i nset =0. 01) 


ELLOII^ulecc: 
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The other important arguments specific to lines are I wd and I t y which specify the line width 
and type drawn in the legend box respectively. It is important to keep these the same as the 
corresponding values in the plot)) and I i nes ( ) commands. We also set pc h to 21 to 
replicate thet y pe=" b" argument in the p I ot ( ) command, cex and t ext. col set the size 
and colors of the legend text. Note that we set the text colors to the same colors as the lines 
they represent. Setting bt y (box type) to " n 11 ensures no box is drawn around the legend. This 
is good practice as it keeps the look of the graph clean, ncol sets the number of columns 
over which the legend labels are spread and inset sets the inset distance from the margins 
as a fraction of the plot region. 


There's more... 


Let's experiment by changing some of the arguments discussed: 

I egendf 1, 3 0 0, I egend=c( "Tokyo" , 11 Ber I i n 11 , 11 New Yor k" , 11 London 11 ) , 
I t y=l, I wd=2, pch=21, col =c( 11 bl ack", M r ed", 11 or ange", 11 pur pi e 11 ), 
hori z=TRUE, bty=" n", bg="yel I ow" , cex=l, 
t ext ■ col =c( 11 bl ack 11 , 11 red 11 , 11 or ange", 11 pur pi e")) 


Monthly Rainfall in major cities 


Tokyo Berlin 一 •- New York London 

S - 

CM 


This time we used x and y co-ordinates instead of a keyword to position the legend. 

We also set the hor i z argument to T R U E. As the name suggests, ho r i z makes the 
legend labels horizontal instead of the default vertical. Specifying hor i z overrides the 
ncol argument. Finally, we made the legend text bigger by setting c ex to 1 and did not 
use the i ns et argument. 



I\// 

0 8 


OOCNogT-OOTT - s o 

(LUE) llejulecc: 
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An alternative way of creating the previous plot without having to call pi ot ( ) and lines)) 
multiple times is to use the ma t p I o t ( ) function. To see details on how to use this function, 
please see the help file by running ? mat pi ot or h e I p ( ma t p I o t) at the R prompt. 


See also 


Have a look at the next recipe, which shows a way to label lines directly instead 
of using a legend. 


Using margin labels instead of legends 
for multiple line graphs 


While legends are the most commonly used method of providing a key to read multiple 
variable graphs, they are often not the easiest to read. Labelling lines directly is one way 
of getting around that problem. 


Getting ready 


We will use the base graphics library for this recipe, so all you need to do is run the recipe 
at the R prompt. It is good practice to save your code as a script to use again later. 



Let's use the gdp. t xt example dataset to look at the trends in the annual GDP 
of five countries: 


gdp<- read, tabl e("gdp_I ong. txt", header =T) 


I i br ar y( RCol or Brewer) 
pal <- brewer • pal ( 5, 11 Set 1") 

par ( mar =par ( ) $mar +c( 0, 0, 0, 2) , bt y =" I ") 

pi ot(Canada 〜 Year, dat a=gdp, type: 11 1 ", I wd=2, I t y = 1, y I i m=c ( 3 0, 60), 
col =pal [ 1] , mai n="Per cent age change in GDP", yl ab="") 

mt ext (si de=4, at =gdp$Canada[ I ength(gdp$Canada)], text: 11 Canada", 
col =pal [ 1] , I i ne=0. 3,I as =2) 


I i nes(gdp$Fr ance-gdp$Year, col =pa I [2], I wd=2) 


mt e x t(si de=4,at =gdp$Fr ance[ I engt h(gdp$France)], text=" France 
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col =paI[2], I i ne=0. 3,I as =2) 


I i nes( gdp$Germany~gdp$Year, col =pal[3],I wd =2) 


mt e x t (s i de=4, at =gdp$Ger many[ I ength(gdp$Germany) ], text ="Ger many 11 
col =paI[3],Ii ne=0. 3,I as =2) 

I i nes(gdp$Britai n~gdp$Year,col =pal[4],I wd =2) 

mt ext (si de=4, at =gdp$Br i t ai n[ I engt h(gdp$Brit ai n)],t ext ="Brit ai n" 
col =paI[4],Ii ne=0. 3,I as =2) 

I i nes(gdp$USA 〜 gdp$Year,col =pal[5],I wd =2) 


mt ext (si de=4, at=gdp$USA[ I ength(gdp$USA) ] - 2, 
t ext =" US A", c ol =pal[5],Ii ne=0. 3,I as =2) 


Percentage change in GDP 



France 

Britain 

Germany 

Canada 

USA 


Year 



We first read the gdp. t xt data file using the read, table)) function. Next we 
loaded the RCol or Br ewer color palette library and set our color palette pal to " Set 1" 

(with five colors). 

Before drawing the graph, we used the par ( ) command to add extra space to the right 
margin, so that we have enough space for the labels. Depending on the size of the text labels 
you may have to experiment with this margin until you get it right. Finally, we set the box type 
(bt y) to an L-shape (" I ") so that there is no line on the right margin. We can also set it to " c 1 
if we want to keep the top line. 
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We used the mt e x t ( ) function to label each of the lines individually in the right margin. The 
first argument we passed to the function is the side where we want the label to be placed. 
Sides (margins) are numbered starting from 1 for the bottom side and going round in a 
clockwise direction so that 2 is left, 3 is top, and 4 is right. 

The at argument was used to specify the Y co-ordinate of the label. This is a bit tricky because 
we have to make sure we place the label as close to the corresponding line as possible. So, here 
we have used the last value of each line. For example, gdp$Fr ance[ I engt h( gdp$Fr ance) 
picks the last value in the France vector by using its length as the index. Note that we had to 
adjust the value for USA by subtracting 2 from its last value so that it doesn't overlap the label 
for Canada. 

We used the text argument to set the text of the labels as country names. We set the c o I 
argument to the appropriate element of the p a I vector by using a number index. The line 
argument sets an offset in terms of margin lines, starting at 0 counting outwards. Finally, 
setting I a s to 2 rotates the labels to be perpendicular to the axis, instead of the default 
value of 1 which makes them parallel to the axis. 


There's more... 


Sometimes, simply using the last value of a set of values may not work because the value 
may be missing. In that case we can use the second last value or visually choose a value that 
places the label closest to the line. Also, the size of the plot window and the proximity of the 
final values may cause overlapping of labels. So, we may need to iterate a few times before we 
get the placement right. We can write functions to automate this process but it is still good to 
visually inspect the outcome. 


Adding horizontal and vertical grid lines 


In this recipe we will learn how to add and customize grid lines to graphs. 


Getting ready 


We will use the base graphics for this recipe, so all you need to do is run the recipe at the R 
prompt. It is good practice to save your code as a script to use again later. 
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There's more... 
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Monthly Rainfall in Tokyo 



We can specify the location of the grid lines using the nx and ny arguments, corresponding 
to vertical and horizontal grid lines respectively. By default, these two arguments are set 
to N U L L, which results in the default grid lines in both X and Y directions. If we do not wish 
to draw grid lines in a particular direction, we can set nx or ny to NA. If n x is set to NA, no 
vertical grid lines are drawn and if n y is set to NA, no horizontal grid lines are drawn. 

The default grid lines are very thin and light colored, they can barely be seen. We can 
customize the styling of the grid lines using the I wd, I t y, and c o I arguments. 

gr i d( nx=NA, ny=8, 

I wd=l ( It y=2, col ="bl ue") 


In the next recipe we will learn to use the abl i ne( ) function, which we can use to draw lines 
at any specific X and Y locations. 


See also 


























Creating Line Graphs and Time Series Charts - 


Adding marker lines at specific 
X and Y values 


Sometimes we may only want to draw one or a few lines to indicate specific cut-off or 
threshold values. In this recipe, we will learn how to do that using the a b I i ne() function. 


Getting ready 


We will use the base graphics library for this recipe, so all you need to do is run the recipe at 
the R prompt. It is good practice to save your code as a script to use again later. 


How to do it... 


Let's draw a vertical line at the month of September in the rainfall graph for Tokyo: 

rain <- read.csv("cityrain.csv") 
pi ot (r ai n$Tokyo, t y pe ="b", I wd =2, 
xaxt="n",yl i m=c( 0, 3 0 0 ) , col =" bl a c k 11 ( 
xl ab=" Mont h", yl ab = " Rai nfall ( mm) 11 , 
mai n=" Mont hi y Rainfall in Tokyo") 
axi s (1, at =1: I ength( rai n$Mont h) , I a be I s =r a i n$Mont h) 

abli ne(v=9) 
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To draw marker lines with a b I i n e ( ) at specific X or Y locations, we have to set the v (as in 
vertical) or h (as in horizontal) arguments respectively. In the example, we set v =9 (the index 
of the month September in the Month vector). 


Now let's add a red dotted horizontal line to the graph to denote a high rainfall cutoff 
of 150 mm: 

abI i ne (h =150, c ol =" r ed 11 , I t y =2) 


Sparklines are small and simple line graphs, useful for summarizing trend data in a small 
space. The word "sparklines" was coined by Prof. Edward Tufte. In this recipe we will learn how 
to make sparklines using a basic pi ot ( ) function. 



Creating sparklines 


more... 


There' 



Sep 


ay 

M 

Apr 


003ogT - 00 
{EE) lleiulBa 


fiosl- 



















Creating Line Graphs and Time Series Charts 


Getting ready 


We will use the base graphics library for this recipe, so all you need to do is run the recipe at 
the R prompt. It is good practice to save your code as a script to use again later. 


do it... 


Let's represent our city rainfall data in the form of sparklines: 

rain <- read.csv("cityrain.csv") 

par ( mf r ow=c(4, 1) , mar =c( 5, 7, 4, 2), omi =c( 0. 2, 2, 0. 2, 2)) 

f o r ( i in 2:5) 

{ 

pi ot(rai n[,i], ann=FALSE, axes =FALSE, t y pe =" I ", 
col = n grayM wd=2) ^ 

mt ext (si de=2, at =me an(rain[,i]),names(rain[i]), 

I as=2, col =" bl ack") 

mt ext (si de=4, at =me an(rain[,i]),mean(rain[i]), 

I as=2, col =" bl ack") 


points) which, mi n(rain[,i]) ( mi n(rain[,i]) ( pch = 19 ( col=" blue") 
poi nt s(whi c h. ma x (r a i n[ , i ]), ma x ( r a i n [, i ]), pch = 19 ( col =" r ed") 




ToKyo 

• 

126.8 

NewYorK 

93.975 



London 

51.35 

Berlin 

48.075 


-R 06 ^ 











Chapter 4 



The key feature of sparklines is to show the trend in the data with just one line without any 
axis annotations. In the example, we have shown the trend with a gray line. The minimum 
and maximum values for each line is represented by blue and red dots respectively, while the 
mean value is displayed on the right margin. 

Since sparklines have to be very small graphics, we first set the margins such that the plot 
area is small and the outer margins are large. We did this by setting the outer margins in 
inches using the o mi argument of the pa r ( ) function. Depending on the dimensions of the 
plot, sometimes R may produce an error saying that the figure margins are too large and not 
draw the graph. In that case, we need to try lower values for the margins. Note we also set up 
a 4x1 layout with the mf r o w argument. 


Next we set up a f o r loop to draw a sparkline for each of the four cities. We drew the line with 
the pi ot ( ) command, setting both annotations (a nn) and axes to f a I s e. Then we used the 
mt e x t ( ) function to place the name of the city and the mean value of rainfall to the left and 
right of the line respectively. Finally, we plotted the minimum and maximum values using the 
points!) command. Note we use the wh i c h. mi n ( ) and wh i c h. max( ) functions to get 
the indices of the minimum and maximum values respectively and used them as the x value 
for the p o i nt s ( ) function calls. 


Plotting functions of a variable in a dataset 


Sometimes we may wish to visualize the effect of applying a mathematical function to a set of 
values, instead of the original variable itself. In this recipe, we will see a simple method to plot 
functions of variables. 


Getting ready 


We will use the base graphics library for this recipe, so all you need to do is run the recipe at 
the R prompt. It is good practice to save your code as a script to use again later. 



Let's say we want to plot the difference in rainfall between Tokyo and London. We can do that 
just by passing the correct expression to the pi ot () function: 

rain <- read.csv("cityrain.csv 11 ) 

pi ot(r ai n$BerIi n_rai n$London,type="l", I wd =2, 
xaxt =" n 11 , col =" bl ue", 

xl ab=" Mont h", yl ab = "Di fference i n Rainfall (mm)", 

ma i n =" Di f f er enc e in Rainfall between Berlin and London (Berlin- 


flOTj- 







Difference in Rainfall between Berlin and London 
(Berlin-London) 



Month 
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London) 11 ) 

axi s(1, at =1:I engt h(r ai n$Mo nth),I a be Is =r ai n$Mont h) 
abl i ne( h=0, col =" red") 


So, plotting a function of a variable is as simple as passing an expression to the pi ot () 
function. In the example, the function consisted of two variables in the dataset. We can 
also plot transformations applied to any one variable. 


As another simple example, let's see how we can plot a polynomial function of a set 
of numbers: 

x<- 1: 100 

y<- x A 3- 6*x A 2+5*x+10 

pi ot (y ~x, t ype=" I ", mai n=expressi on(f(x)==x A 3-6*x A 2+5*x+10)) 



more... 


There 1 
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Formatting time series data for plotting 


Time series or trend charts are the most common form of line graphs. There are a lot of ways 
in R to plot such data, however it is important to first format the data in a suitable format that 
R can understand. In this recipe, we will look at some ways of formatting time series data 
using the base and some additional packages. 


Getting ready 


In addition to the basic R functions, we will also be using the z oo package in this recipe. So 
first we need to install it: 


i nst al I . packages ( 11 zoo 11 ) 
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Let's use the dai I ysal es. c s v example dataset and format its d a t e column: 


s a I es <- r ea d. cs v("daiI ysal es. cs v") 

dl<- as. Dat e( s a I es $dat e, " %d / %m/ %y") 

d2 <- st r pt i me (s a I es $dat e, 11 %d / %m/ %y 11 ) 

data.class(dl) 

[1] "Date" 


dat a. cI ass(d 2) 
[1] 11 P OS I Xt" 



We have seen two different functions to convert a character vector into dates. If we did not 
convert the date column, R would not automatically recognize the values in the column as 
dates. Instead, the column would be treated as a character vector or a factor. 

The as. Dat e() function takes at least two arguments: the character vector to be converted 
to dates and the format to which we want it converted. It returns an object of the Date class, 
represented as the number of days since 1970-01-01，with negative values for earlier dates. 
The values in the date column are in a DD/MM/YYYY format (you can verify this by typing 
sal e s $ d a t e at the R prompt). So, we specify the format argument as 11 %d/ %m/ %y". Please 
note that this order is important. If we instead use 11 %m/ %d / %y", then our days will be read as 
months and vice-versa. The quotes around the value are also necessary. 

The s t r pt i me ( ) function is another way to convert character vectors into dates. However, 
s t r pt i me ( ) returns a different kind of object of class P OS I XI t, which is a named list of 
vectors representing the different components of a date and time, such as year, month, day, 
hour, seconds, minutes, and a few more. 
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POSI XI t is one of the two basic classes of date/times in R. The other class P OS I Xct 
represents the (signed) number of seconds since the beginning of 1970 (in the UTC time 
zone) as a numeric vector. POSI Xct is more convenient for including in data frames, and 
POS I XI t is closer to human readable forms. A virtual class POSI Xt inherits from both of the 
classes. That's why when we ran the data, class!) function on d 2 earlier, we get POSI Xt 
as the result. 

st r pt i me ( ) also takes a character vector to be converted and the format as arguments. 


There's more... 


The zoo package is handy for dealing with time series data. The z o o ( ) function takes an 
argument x, which can be a numeric vector, matrix, or factor. It also takes an or der. by 
argument which has to be an index vector with unique entries by which the observations 
in x are ordered: 

Ii br ar y(zoo) 


d 3 <• z o o (s a I es$uni t s, as. Dat e( s a I es$dat e, " %d / %m/ %y 11 )) 

data.class(d3) 

[1] "zoo" 


See the help on Dat eTi me Cl asses to find out more details about the ways dates can be 
represented in R. 


Plotting date and time on the X axis 


In this recipe, we will learn how to plot formatted date or time values on the X axis. 


Getting ready 


For the first example, we only need to use the base graphics function pi ot (). 






We will use the a a i I ysa 
sold daily in a month: 

sal es<- read. csv( 1 
pi ot (sal es $uni t s - 
xl ab= ,l Date ll ( yl ab 








Time trend of Oxides of Nitrogen 


~I I I I 

1998 2000 2002 2004 

Time 


The same graph can be made using zoo as follows: 

pi ot (zoo( ai r $nox, as_ Dat e( ai r $dat e, " %d / %m/ %Y %H : %M")), 
xl ab="Ti me", yl a b = " Conce nt r at i on (ppb)__, 
main:" Time trend of Oxides of Nitrogen") 


Annotating axis labels in different human 
readable time formats 


In this recipe, we will learn how to choose the formatting of time axis labels, instead of just 
using the defaults. 
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There's more... 


We can plot the example using the z o o () function as follows (assuming zoo 
is already installed): 

Ii br ar y(zoo) 

pi ot (zoo( sal es$uni ts ( as. Date( sal es$dat e, " %d / %m/ %y"))) 

Note that we don't need to specify x and y separately when plotting using z o o; we can just 
pass the object returned by z o o ( ) to p I o t () . We also need not specify the type as " I ". 

Let's look at another example which has full date and time values on the X axis, instead of just 
dates. We will use the openai r. c s v example dataset for this example: 

ai r <■ r ead. cs v( 11 openai r. csv") 

pi ot (ai r $nox~as_ Dat e( ai r $dat e, " %d / %m/ %Y %H : %M") , t ype = " I ", 
xl ab="Ti me", yl a b = " Conce nt r at i on (ppb)__, 
main:" Time trend of Oxides of Nitrogen") 


081.009OOCN0 
{qdd) U0IJ21USU0O 
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Getting ready 


We will only use the basic R functions for this recipe. Make sure you are at the R prompt and 
load the o p e n a i r. c s v dataset: 

ai r <- r ead. cs v( 11 openai r. csv") 



Let's redraw our original example of plotting air pollution data from the last recipe, but with 
labels for each month and year pairing: 

pi ot (ai r $nox-as. Date(ai r$date ( 11 %d/%m/%Y %H: %M") , t ype=" I", 
xaxt ="n", 

xl ab="Ti me", yl a b = " Cone e nt r a t i on (p p b) 11 , 
ma i n =" Ti me trend of Oxides of Nitrogen") 

xl a bel s <- s t r pt i me( a i r $ da t e, format = 11 %d/%m/%Y %H:%M") 
axi s. Dat e( 1, at =xl abel s[ xl abel s$mday==l], f or mat =" %b- %Y") 



Time trend of Oxides of Nitrogen 

Concentration (ppb) 

0 200 600 1000 

ikill 

lljllll 


||| 



Jan-1998 Aug-1998 Apr-1999 Nov-1999 Jun-2000 Jan-2001 Aug-2001 Apr-2002 Nov-2002 Jun-2003 Jan-2004 Aug-2004 Apr-2005 

Time 



In our original example of plotting air pollution data in the last recipe, we only formatted 
the date/time vector to pass as an x argument to p I o t ( ) , but the axis labels were chosen 
automatically by R as the years 1998, 2000, 2002, and 2004. In this example, we drew a 
custom axis with labels for each month and year pairing. 

We first created an object x I a b e I s of class POSI XI t by using the st r pt i me() function. 
Then we used the axis. Dat e() function to add the X axis, a x i s. Dat e ( ) is similar to the 
axi s () function and takes the side and at arguments. In addition, it also takes the f o r ma t 
argument, which we can use to specify the format of the labels. We specified the a t argument 
as a subset of xl a bel s for only the first day of each month by setting mday =1. The format 
value 11 %b - %Y 11 means abbreviated month name with full year. 
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There's more... 


See the help on st r pt i me() to see all the possible formatting options. 


Adding vertical markers to indicate 
specific time events 


We may wish to indicate specific points of importance or measurements in a time series, 
where there is a significant event or change in the data. In this recipe, we will learn how 
to add vertical markers using the a b I i n e () function. 


Getting ready 


We will only use the basic R functions for this recipe. Make sure you are at the R prompt and 
load the o p e n a i r. c s v dataset: 

air<-read.csv("openair.csv") 


Let's take our air pollution time series example again and draw a red vertical line on 
Christmas day - 25/12/2003: 

pi ot (ai r $nox 〜 as. Dat e( ai r $dat e, " %d/%m/%Y %H: %M") , t ype=" I ", 
xl ab=" Ti me", ylab = "Concent rati on (ppb)", 
ma i n =" Ti me trend of Oxides of Nitrogen") 

abl i ne(v=as. Dat e(" 2 5 / 1 2/ 2 0 0 3 ", " %d/%m/%Y")) 


Time trend of Oxides of Nitrogen 
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1998 2000 2002 2004 

Time 


We created a sequence of the Christmas dates for each year using the s eq ( ) function, 
which takes f r o m，t o ， and by arguments. Then we passed this vector to the a b I i n e () 
function as v. 

One important thing to note is that by default R does not deal with gaps in a time series. 
There can be missing values denoted by N A and as you can see in the previous examples, 
the graphs show gaps in those places. However, if any dates or time intervals are missing 
from the actual dataset, then R draws a line connecting the data points before and after 
the gap instead of leaving it blank. In order to remove this connecting line, we must fill in 
the missing time intervals in the gap and set the y values to NA. 



As we have seen before in the recipe introducing a b I i n e () ， we drew a vertical line in the 
example by setting the v argument to the date we want to mark. We specified 25/12/2003 as 
the x co-ordinate by using the as. Dat e( ) function. Note that the original time series plotted 
also contains the timestamp in addition to the dates. Since we didn't specify a time, the line 
was plotted at the start of the specified date 25/12/2003 00:00. 


There's more... 


Let's look at another example, where we want to draw a vertical marker line on Christmas day 
of every year: 

mar ker s<- seq( f rom=as. Dat e( " 2 5/ 1 2 / 1 9 9 8 ", 11 %d/ %m/ %Y n ), 
to=as. Date(" 2 5/ 1 2/ 2 0 04 ", l ，0 /od/ %m/%Y"), 
by="year") 

abI i ne(v =mar ker s, col =" r ed") 


Time trend of Oxides of Nitrogen 
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Plotting data with varying time 
averaging periods 


In this recipe, we will learn how we can plot the same time series data by averaging it over 
different time periods using the aggregate) ) function. 


Getting ready 


We will only use the basic R functions for this recipe. Make sure you load the 
openai r. cs v dataset: 

ai r <- r ead. cs v( " opena i r. c s v 11 ) 


How to do it... 


Let's plot the air pollution time series with weekly and daily averages instead of hourly values: 

ai r $dat e = as. POSI Xc t (s t r pt i me (a i r $ da t e, format = " %d / %m/ %Y %H: %M", 

11 GMT 11 )) 

means <- a g g r e g a t e ( a i r [ 11 n o x 11 ], f o r ma t (a i r [ 11 d a t e 11 ] , 11 %Y - %U 11 ), me a n , 
na. r m = TRUE) 

me a n s $ d a t e <- seq(air$date[l], ai r$date[ nrow( ai r)] ( 

length = nr ow( means)) 

pi ot(means$date, means$nox, type = " I ") 



lmj| 
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means <- a g g r e g a t e (a i r [ 11 n o x 11 ], f o r ma t (a i r [ 11 d a t e 11 ] , ■_ %Y - %j ■'), me a n, 
na. r m = TRUE) 

me a n s $ d a t e <- seq(air$date[l], ai r$date[ nrow( air)], 

length = nr ow( means)) 

pi ot(means$date, me a n s $ n o x, type = " I ", 

xl ab=" Ti me", yl a b = " Conce nt r at i on (ppb)__, 

ma i n ="DaiI y Aver age Concentrations of Oxides of Nitrogen") 




The key function in these examples is the aggregate) ) function. Its first argument is 
R object x, which has to be aggregated, in this case a i r [ 11 n o x 11 ] ■ The next argument is 
the list of grouping elements over which x has to be aggregated. This is the part where we 
specify the time period over which to average the values. In the first example we set it to 
f o r ma t (a i r [ " d a t e" ] , 11 %Y • %U 11 ) , which extracts all the weeks out of the d a t e column 
using the f o r ma t ( ) function. The third argument is F U N or the name of the function to apply 
to the selected values, in our case me a n. Finally, we set n a. r m to T R U E, thus telling R to 
ignore missing values denoted by NA. 


Once we have the mean values saved in a data frame, we add a date field to this new vector 
using the s eq ( ) function and then plot the means against the date using pi ot (). 

In the second example, we use f o r ma t (a i r [ 11 d a t e" ] , " %Y - %j 11 ) to calculate daily means. 


Creating stock charts 


Given R's powerful analysis and graphical capabilities, it is no surprise that R is very popular 
in the world of finance. In this recipe, we will learn how to plot data from the stock market 
using some special libraries. 
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Getting ready 


We need the t s e r i es and quant mod packages to run the following recipes. Let's install and 
load these two packages: 


Let's first see an example using the t ser i es library function get. hist.quotes)). We will 
compare stock prices of three technology companies: 

aapl <- get. hi st. quot e( i nst r ument = M a a p I ", quote = c( 11 Cl ", 11 Vo I 11 )) 

goog <- get. hi st • quote( i nstrument = "g o o g", quote = c("CI ", "V o I ")) 

ms ft <- get ■ hi st • quot e( i nst r ument = " ms f t", quote = c(" Cl ", " Vol ")) 

pi ot(msft$CI ose, mai n = "Stock Price Comparison", 

yl i m=c( 0, 8 0 0 ), col =" red 11 ( type= u I 11 , I wd=0. 5, 

pch = 19, cex=0. 6, xl ab=" Dat e" , yl ab=" St ock Price (USD) 11 ) 

I i nes(goog$CI ose, col =" bl ue 11 , I wd=0_ 5) 

I i nes(aapl $CI ose, col =" gr ay 11 , I wd=0. 5) 

I egendf "top", hor i z =T, I egend=c( 11 Mi c r os of t 11 , 11 Goog I e", "Apple"), 
col =c( "red 11 , 11 bl ue", " g r a y 11 ) , I ty=l ( bty="n") 


Stock Price Comparison 
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The get. hi st. quot e( ) function retrieves historical financial data from one of two 
providers (yahoo (for Yahoo) or oanda (for OANDA), yahoo being the default). We passed the 
i n s t r u me n t and quote arguments to this function which specify the name of the stock and 
the measure of stock data we want. In our example, we used the function three times to pull 
the closing price and volume for Microsoft (ms f t), Google (goog), and Apple (a a pi )■ We then 
plotted the three stock prices on a line graph using the pi ot ( ) and I i nes ( ) functions. 


There's more... 


Now let's make some charts using the q u a n t mo d package. This package provides inbuilt 
graphics functions to visualize the stock data: 

getSymbol s( "AAPL", s r c =" y a hoo 11 ) 
bar Chart(AAPL) 



First we obtained stock data for Apple using the get Symbol s ( ) function by specifying the 
stock name and source. Again, the default source is Yahoo. The stock data is stored in an R 
object with the same name as the stock symbol (AAPL for Apple, GOOG for Google, and so on). 
Then we passed this object to the bar Char t ( ) function to produce the previous 
graph above. Of course, it is more than just a bar chart. 











A similar chart in a different color scheme can be drawn as follows: 
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c a nd I eChar t ( AAPL, t he me =" wh i t e") 



For more detailed information about the quant mo d package, visit its website at: 
http:// www, quant mod. com 
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Creating Bar ， Dot，and 

Pie Charts 


In this chapter, we will cover: 

► Creating bar charts with more than one factor variable 

► Creating stacked bar charts 

► Adjusting the orientation of bars—horizontal and vertical 

► Adjusting bar widths, spacing, colors, and borders 

► Displaying values on top of or next to the bars 

► Placing labels inside bars 

► Creating bar charts with vertical error bars 

► Modifying dot charts by grouping variables 

► Making better readable pie charts with clockwise-ordered slices 

► Labelling a pie chart with percentage values for each slice 

► Adding a legend to a pie chart 


Introduction 


In this chapter, we will look in some detail at bar charts, dot charts, and pie charts. Bar charts 
are used commonly both in reporting business data and also in scientific analysis. We will 
see how we can enhance the basic bar charts in R by adjusting some parameters in the base 
graphics library. There are a few different packages which can be used to make bar charts 
(most notably I a 11 i c e and ggpl ot 2). But in this chapter, we will see how we can create 
many useful variations of bar graphs only by using the base library functions. 
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We will also look at a few recipes on pie charts—easily the most criticized type of chart in the 
scientific community, but also one of the most popular in the business world. While it is true 
that pie charts often obscure the data and are hard to read, the recipes in this chapter offer 
some ways to make pie charts more readable. 

Some of the parameters are obscure and sometimes it may not be absolutely clear as to what 
values an argument can take. It is best to experiment as you go along and try out the recipes. 
You may not understand a function or its arguments fully, until you have tried to graph a few 
of your own datasets. If you get stuck at any point, first look at the help file of the relevant 
function. If you are still stuck after having read the help files, then you may search the R 
mailing list (http:// www. r - p r o j e c t. o r g / ma i I . h t ml ) and forums (http://r. 789695. 
n4. nabbl e. com/ and htt p: // st ackoverf I o w. com/questi ons/tagged/r). Often, 
the problems one comes across are common and may have already been addressed by 
the R community in response to someone else's question. 


Creating bar charts with more than one 
factor variable 


In this first recipe, we will learn how to make bar charts for data with more than one category. 
Such bar charts are commonly used for comparing values of the same measure across 
different categories. 


Getting ready 


We will be using the base library bar pi ot ( ) function, but we will also use the 
RCo I or Br ewer package to choose a good color palette. So let's first install and 
load that package: 

i nstalI. packages)"RCol orBrewer") #if not already installed 
I i br ar y( RCol or Brewer) 


How to do it... 


Let's use the citysales.csv example dataset that we used in the first chapter once again: 
c i tysalesc-read, csv("cit ysal es _c s v") 

barpl ot (as. mat r i x( c i tysal es[ , 2:4]), besi de=TRUE, 

I egend.t ext =ci t ysales$Ci t y, 

args. I egend = l i st (bty="n", horiz=TRUE), 

col =br ewer. pal ( 5, 11 Set 1"), 

bor der =" whi t e", yl i m=c( 0 ( 10 0) ( 

yl ab=" Sal es Revenue ( 1, 0 0 0's of USD) 11 , 







mai n=" Sal es Figures") 
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Sales Figures 

_ Seattle _ London ■ Tokyo _ Berlin ■ Mumbai 



ProductA ProductB ProductC 


box( bty= M l 11 ) 


The key argument for drawing bar charts with more than one category is the beside 
argument, which must be set to T R U E. The first argument is the input data, which must be in 
the form of a matrix. The columns of the matrix are the categories (in this case ProductA, 
ProductB, and Pr oduct C), while the rows are the set of values for each category. If we do 
not set the beside argument to T R U E, we will get a stacked bar chart (as we will see later in 
this chapter). 

Most of the other arguments of the b a r p I ot () function work the same way as they do for 
pi ot () . The ar gs. I egend argument takes a list of arguments and passes them on to the 
I egendf ) function. We can instead also call the I egendf ) function separately after the 
bar pi ot () call. 


In the next recipe, we will learn how to make stacked bar charts. 



See also 
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Creating stacked bar charts 


Stacked bar charts are another form of bar charts used to compare values across categories. 
As the name implies, the bars for each category are stacked on top of each other instead of 
being placed next to each other. 


Getting ready 


We will use the same dataset and color scheme as the last recipe, so please ensure you have 
the RCoI or Br ewer package installed and loaded: 


do it... 


Let's draw a stacked bar chart of sales figures across the five cities: 
c i tysalesc-read, cs v("cit ysal es _c s v") 

barpl ot (as. mat ri x(ci tysal es[, 2:4]), 

I egend.t ext =ci tysales$Ci t y, 

ar gs. I egend=l i st(bt y ="n", horiz =TRUE), 

col =br ewer ■ pal (5, " Set l 11 ) , bor der =" whi t e", 

yl i m=c( 0, 2 0 0 ), yl ab=" Sal es Revenue ( 1, 0 0 0 1 s of USD) 11 

mai n=" Sal es Figures") 
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r I x(ci t ysal esper c[, 2: 4]), 

(5, 11 Set 1") , bor der =" whi t e", 
venue ( 1, 0 0 0's of USD) 11 , 
ge Sales Figures") 


Iegend("rI ght",I egend=ci t ysal esper c $Ci t y, bt y = "n", 
i nset =c( - 0. 3, 0), f i I I =brewer. pal(5, H Set 1")) 
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If you compare the code for this example and the last recipe, you will see that the main 
difference is that we did not use the beside argument. By default, it is set to F A L S E ， 
which results in a stacked bar chart. We extended the top y axis limit from 100 up to 200. 


There's more... 


Another common use of stacked charts is to compare relative proportion of values across 
categories. Let's use the example dataset c i tysal es per c. csv，which contains the 
percentage values of sales data by city for each of the three products A, B, and C: 

ci tysal es per c <- read, cs v( 11 ci tysal es per c. csv") 

pa r ( ma r =c( 5, 4, 4, 8) , xpd =T) 


t - G 3 

0 3 R t 

mp n 
. ■ s G 
s r G c 

3 G - r 

, — > w 3 G 

t 6 s p 

_Q III III 

p. -- h~ 
r - 3 




3 o - 3 

b c y m 















Creating Bar, Dot, and Pie Charts 


In the graph, the Y axis shows the percentage of sales of a product in a city. It is a good way to 
quickly visually compare the relative proportion of product sales in cities. The code we used 
for the main graph is the same as the previous example. One difference is that we drew the 
legend separately using the I egend( ) command. Note that we drew the legend outside the 
plot region by setting the x part of inset to a negative value. We also had to create a larger 
margin to the right using the ma r argument in the pa r ( ) function and also setting x pd to 
TRUE to allow the legend to be drawn outside the plot region. 


Adjusting the orientation of bars — horizontal 
and vertical 


In this recipe, we will learn how to adjust the orientation of bars to horizontal or vertical. 


Getting ready 


We will use the same dataset we used in the last few recipes (c i t y s a I es. c s v) and the 
RCol or Brewer color palette package. 


Let's make a bar chart with horizontal bars: 

barplot(as.matrix(citysales[ ( 2:4]), beside =TRUE,horiz=TRUE, 
I egend. t ext =ci t ysal es$Ci t y, args.legend =1 ist(bty = "n"), 
col =br ewer. pal (5, " Set 1") , bor der =" whi t e", 
xl i m=c (0,100), xl ab = "Sal es Revenue ( 1,0 0 0's of USD)", 
mai n=" Sal es Figures") 



-[ill} 
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In the example, we set the ho r i z argument to T R U E ， which makes the bars horizontal. By 
default hor i z is set to F A L S E, making the bars vertical. While it's really easy to make the 
bars horizontal, we must remember that the axes are reversed when we do that. So, in the 
example, we had to set the limits for the X axis (xl i m instead of y I i m) and set x I a b (instead 
of y I a b) to " S a I e s Revenue" .We also removed the hor i z =TRUE argument from the 
legend arguments list because that would have plotted some of the legend labels on top of 
the ProductC bars. Removing the ho r i z argument puts the legend back into its default top 
right position. 


There's more... 


Let's draw the stacked bar chart from the last recipe with horizontal bars: 
par(mar=c(5,4,4,8),xpd=T) 

barpl ot (as. mat ri x( ci tysal esper c[, 2: 4]), horiz=TRUE, 
col =br ewer. pal (5, "Set 1"), bor der =" whi t e", 
xlab="Percentage of Sales", 
mai n="Perecentage Sales Figures") 

I egend(" r i g h t 11 , I egend=ci tysal es per c $Ci ty, bty="n", 
i nset =c( - 0. 3, 0), f i I I =br ewer. pa I ( 5, " Se11 11 )) 
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Again, we had to simply set the nor i z argument to T R U E and adjust the margins to 
accommodate the legend to the right outside the plot region. 


In this recipe, we will learn how to adjust the styling of bars by setting their width, the space 
between them, colors, and borders. 


We will continue using the c i t y s a I es. csv example dataset in this recipe. Make sure you 
have loaded it into R and type the recipe at the R prompt. You may also want to save the 
recipe as a script so that you can easily run it again later. 


Let's adjust all the arguments at once to make the same graph as in the first recipe but with 
different visual settings: 

barplot(as.matrix(citysales[,2:4]) ( beside=TRUE, 

I egend. t ext =c i tysal es$Ci t y, args. I egend=l i s t ( bty = "n", ho r i z=T), 
col =c( 11 #E 5 5 6 2 A", 11 #4 91A5B", 11 #8C6CA8" , 11 #BD1B8A" , " #7CB6E4"), 
border=FALSE,space=c(0, 5), 

yl i m=c( 0, 100), yl ab="Sales Revenue ( 1, 0 0 0's of USD)", 
mai n=" Sal es Figures") 


it... 


Getting ready 


Adjusting bar widths, spacing, colors, 
and borders 
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Firstly, we changed the colors of the bars by setting the c o I argument to a vector of five 
colors we formed by hand, instead of using a RCol or Br ewer palette. If we do not set the c o I 
argument, R automatically uses shades from the grayscale. 

Next, we set the border argument to F A L S E. This tells R not to draw borders around each 
individual bar. By default black borders are drawn around bars, but they usually don't look very 
good. So, we set border to 11 wh i t e" in the earlier recipes of this chapter. 

Finally, we set the space argument to c (0, 5) , a vector of two numbers, to set the space 
between bars within each category and between the groups of bars representing each 
category respectively. We left no space between bars within a category and increased the 
space between categories. 

Adjusting the space between bars automatically adjusts the width of the bars too. There is 
also a wi dt h argument, which we can use to set the width when plotting data for a single 
category, but the wi dt h argument is ignored when plotting for multiple categories. So, it's 
best to use space instead. 


There's more... 


The following is an example showing the previous graph with the default settings for color, 
spacing, and borders: 

barpl ot (as. mat ri x( ci tysal es[ , 2: 4]), beside=T, 

I egend. t ext =c i tysal es$Ci t y, a r gs. I egend=l i st(bty="n", hori z=T), 
yl i m=c(0, 100), yl ab="Sal es Revenue ( 1, 0 0 0's of USD)", 
mai n=" Sal es Figures") 
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Sometimes it is useful to have the exact values displayed on a bar chart to enable quick and 
accurate reading. There is no built-in function in R to do this. In this recipe, we will learn how 
to do this by writing some custom code. 


Once again we will use the c i t y s a I es. csv dataset and build upon the graph from the first 
recipe in this chapter. 


Let's make the graph with vertical bars and display the sales values just on top of the bars: 

x<- bar pi ot (a s. mat r i x (c i t y s a I es[, 2:4]), besi de=TRUE, 

I egend.t ext =ci tysales$Ci t y, a r gs.I egend=l ist(bty = "n", ho r i z =TRUE), 

col =br ewer ■ pal (5, " Set 1") , bor der =" whi t e", 

yl i m=c( 0, 100), yl ab="Sal es Revenue ( 1, 0 0 0's of USD)", 

mai n=" Sal es Figures") 


y<-as.matrix(citysales[,2:4]) 
t ext (x, y+2,I abel s =as.character(y)) 





Getting ready 


Displaying values on top of or next to 
the bars 
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In the example, we have used the t ext ( ) function to label the bars with the corresponding 
values. To do so, we constructed two vectors x and y with the X and Y co-ordinates of the 
labels. We first created the bar pi ot and saved it as an R object called x _ When the result of 
the bar pi ot ( ) function call is assigned to an object, a vector containing the X co-ordinates 
of the center of each of the bars is returned and saved in that object. You can verify this by 
typing x at the R prompt and hitting Enter. 

For the y vector, we created a matrix of the sales value columns. Finally, we passed the x 
and y values to t e x t ( ) as co-ordinates and set the labels argument to y values transformed 
into characters using the as. char act er ( ) function. Note that we added 2 to each y value 
so that the labels are placed slightly above the bar. We may have to add a different value 
depending on the scale of the graph. 


There's more... 


We can place the value labels next to the bars in a horizontal bar chart simply by swapping 
the x and y vectors in the t ext ( ) function call: 

y<-barplot(as.matrix(citysales[,2:4]), beside =TRUE,horiz=TRUE, 

I egend. t ext =c i tysal es $Ci ty, a r gs. I egend=l i st(bty="n"), 
col =brewer. pal ( 5, "Set 1"), bor der ="whit e", 
xl i m=c( 0 ( 100) ( xlab = n Sales Revenue ( 1, 0 0 0's of USD) 11 , 
mai n=" Sal es Figures") 

x<-as.matrix(citysales[,2:4]) 

t ext (x+2, y, I a be Is =a s. character)x)) 
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See also 


In the next recipe, we will learn how to place text labels inside bars. 


Placing labels inside bars 


Sometimes we may wish to label bars by placing text inside the bars instead of using a legend. 
In this recipe, we will learn how to do that based on code similar to the previous recipe. 


Getting ready 


We will use the cityrain.csv example dataset. We don't need to load any additional 
packages for this recipe. 



We will plot the rainfall in the month of January in four cities as a horizontal bar chart: 
rain <-read.csv( 11 cityrain.csv") 

y <- bar pi ot (as. mat r i x( r ai n[ 1, -1 ]) , hor i z =T, col =" whi t e 11 , 
yaxt =" n", mai n=" Rainfall in J a nua r y 11 , xl a b =" Ra i nf a I I (mm)") 

x<- 0. 5*r ai n[ 1, -1] 
text(x,y,colnames(rain[-l])) 


Rainfall in January 



Rainfall (mm) 
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How it works... 


The example is very similar to the one in the previous recipe. The only difference is that now 
we are plotting one set of bars, not groups of bars. Because we want to place the labels inside 
the bars, we turned off the Y axis labels by setting y axt =" n" . Otherwise, the city names 
would appear along the Y axis to the left of the bars. We retrieve the Y axis co-ordinates of 
the bars by setting y to the bar pi ot function call. We created the vector x so as to place the 
labels in the middle of each of the bars by multiplying the rainfall values by 0. 5. Note that 
these X co-ordinates represent the center of each label, not its start. Finally, we pass the 
x and y co-ordinates and city names to t e x t ( ) to label the bars. 


There's more... 


As we have seen in the example and the previous recipe, once we retrieve the x or y 
co-ordinates of the center of bars, we can place labels in any position relative to those 
co-ordinates. 


Creating bar charts with vertical error bars 


Bar charts with error bars are commonly used in analysing and reporting results of scientific 
experiments. In this recipe, we will learn how to add error bars to a bar chart in a similar way 
to the recipe for scatter plots in Chapter 3. 


Getting ready 


We will continue using the c i t y s a I es. c s v example dataset in this recipe. Make sure you 
have loaded it into R and type the recipe at the R prompt. You may also want to save the 
recipe as a script so that you can easily run it again later. 


How to do it... 


One change we will make in this recipe is that we will use the transpose of the c i t ysal es 
dataset (turns rows into columns and columns into rows). So, first let's create the transpose 
as a new dataset: 

s a I es <-1 ( a s. ma t r i x (c i t ys a I es [，- 1 ] ) ) 
col names (s al es) <- c i t ys al es [, 1] 

Now, let's make a bar plot with 5% error bars showing the sales of the three products across 
the five cities as categories: 


x<- bar pi ot (s al es, besi de=T, I egend. t ext =r ownamesf sal es), 
ar gs.I egend=li st(bt y="n", hor i z =T), 

col =br ewer ■ pal ( 3, " Set 2" ) , bor der =" whi t e", yl i m=c( 0, 10 0), 
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yl ab="Sal es Revenue ( 1,0 0 0's of USD)", 
mai n=" Sal es Figures") 

ar r ows (x0=x, y0=sal es*0. 95, 
xl=x,yl=sales*1. 05, 
angl e=90, 
code=3, 

Iengt h=0.04, 

Iwd=0. 4) 



We first created the bar chart with the transposed data, so that the sales data are 
represented as groups of three products for each of the cities. We saved the X co-ordinates of 
these bars as a vector x. Then we used the arrows)) function, just like we used it in Chapter 
3 for making error bars on scatter plots. The first four arguments are the X and Y co-ordinate 
pairs of the start and end points of the error bars. The X co-ordinates x 0 and x 1 are both set 
equal to x and the Y co-ordinates are sales values 5% above and below the original values. 

The angle and code set the type of arrow and flatten the arrow head relative to the length of 
the arrow; I engt h and I wd set the length and line width of the arrows. 


Sales Figures 
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There's more... 


The code for drawing the error bars can be saved as a function and used with any bar pi ot . 
This can be especially useful when comparing experimental values with control values, trying 
to look for a significant effect: 

er r or bar s <- f unct i on( x, y, upper, I ower =upper, I engt h=0. 04,1 wd=0. 4, …） { 

ar r ows (x0=x, y0=y+upper, 

xl=x, y 1 =y- I ower, 

angl e=90, 

code=3, 

I engt h =1 engt h, 

I wd=l wd) 

} 

Now, error bars can be added to the previous graph' and delete 'can be drawn simply 
by calling: 

errorbars(x, sal es, 0. 05*s al es) 


In practice, scaled estimated standard deviation values or other formal estimates of error 
would be used for drawing error bars instead of a blanket percentage error as shown here. 


Modifying dot charts by grouping variables 


In this recipe, we will learn how to make dot charts with grouped variables. Dot charts are 
often preferred to bar charts because they are less cluttered and convey the same information 
more clearly with less ink. 


Getting ready 


We will continue using the c i t y s a I es. c s v example dataset in this recipe. Make sure you 
have loaded it into R and type the recipe at the R prompt. You may also want to save the 
recipe as a script so that you can easily run it again later. We will need the reshape package 
to change the structure of the dataset. So let's make sure we have it installed and loaded: 

i nst alI. packages("reshape") 

Ii br ar y(reshape) 
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aotchart (sal es[, 3] , I abel s=sal es$City, groups=sal es[, 2], 

col =s a I es$col or, pc h = 19, 

mai n=" Sal es Figures", 

xlab=" Sales Revenue ( 1, 0 0 0's of USD)") 


Sales Figures 



Sales Revenue (1,000's of USD) 



We first converted the data into long form by applying the me I t ( ) function from the reshape 
library. The following is what the new dataset looks like: 


We will first apply the me I t ( ) function to the c i t y s a I es dataset to convert it to long form 
and then use the dot char t ( ) function: 


s a I es <- mel t (c i t y s a I es) 
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City 

Variable 

Value 

Mumbai 

ProductA 

3 

London 

ProductB 

6 

Tokyo 

ProductB 

7 

Seattle 

ProductB 

11 

Seattle 

ProductC 

12 

Tokyo 

ProductC 

13 

Mumbai 

ProductC 

14 

Seattle 

ProductA 

23 

Tokyo 

ProductA 

24 

Berlin 

ProductB 

34 

Berlin 

ProductA 

36 

Berlin 

ProductC 

44 

London 

ProductC 

56 

Mumbai 

ProductB 

78 

London 

ProductA 

89 


Then we add a column called col or，which holds a different value of color for each product 
(red, blue, and violet). 

Finally we call the dot char t ( ) function with the values column as the first argument. We set 
the labels argument to the city names and group the points by the second column (product). 
The color is set to the color column using the c o I argument. This results in a dot chart with 
the data points grouped and colored by products on the Y axis. 


Making better readable pie charts with 
clockwise-ordered slices 


Pie charts are very popular in business data reporting. However, they are not preferred by 
scientists and often criticized for being hard to read and obscuring data. In this recipe, 
we will learn how to make better pie charts by ordering the slices by size. 


Getting ready 


In this recipe, we will use the browsers, t xt example dataset, which contains data about 
the usage percentage share of different internet browsers. 


f«39l- 
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First we will load the br ows er s. t xt dataset and then use the pi e( ) function to draw 
a pie chart: 

browsers<-read.table) "browsers.txt 11 ,header=TRUE) 
browsers<-browsers[order(browsers[, 2]),] 

pi e(browsers!, 2] ( 

I abel s=browser s[, 1], 
cl ockwi se=TRUE, 
r a di us =1, 

col =br ewer. pal(7,"Set 1"), 
border ="whi t e", 

ma in=" Percentage Share of Internet Browser usage") 




The important thing about the graph is that the slices are ordered in ascending order of their 
sizes. We have done this because one of the main criticisms of pie charts is that when there 
are many slices and they are in a random order, it is not easy (often impossible) to tell whether 
one slice is bigger than another. By ordering the slices by size in a clockwise direction, we can 
directly compare the slices. 

We ordered the dataset by using the or der ( ) function, which returns the index of its 
argument in ascending order. So if we just type or der ( br ows er s [ , 2] ) at the R prompt 
we get: 

[1] 7 6 5 3 2 1 4 
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That's a vector of the index of the share values in ascending order in the original dataset. For 
example, Firefox which has the largest share is in the fourth row, so the last number in the 
vector is 4. We then use the index to reassign the browser dataset in the ascending order of 
share by using the square bracket notation. 

Then we pass the share values in the second column as the first argument to the p i e () 
function of the base R graphics library. We set labels to the first column, the names of 
browsers (note IE stands for Internet Explorer). We also set the c I ockwi se argument to 
TRUE. By default slices are drawn counterclockwise. 


See also 


In the next two recipes, we will see how we can further enhance pie charts with percentage 
value labels. 


Labelling a pie chart with percentage 
values for each slice 


In this recipe, we will learn how to add the percentage values in addition to the names of 
slices, thus making them more readable. 


Getting ready 


Once again in this recipe, we will use the browsers, t xt example dataset, which contains 
data about the usage percentage share of different internet browsers. 



First we will load the b r o ws e r s. t xt dataset and then use the pi e( ) function to draw 
a pie chart: 

browsersread.t abl e("br ows e r s.t xt", header=TRUE) 
browsers<-browsers[order(browsers[, 2]),] 

pi el abel s <- s pr i nt f ( 11 %s = %3. 1 f %s 11 , br owser s [, 1], 

100* br owser s [, 2] / s um( br owser s[ , 2]), " %") 

pi e(browsers!, 2], 

I abels =piel abels, 
cl ockwi s e=TRUE ( 
r adi us =1, 
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col =brewer■ pal(7,"Set 1"), 
border ="whi t e", 
cex=0_ 8, 

main="Percentage Share of Internet Browser usage") 




In the example, instead of using just the browser names as labels, we first created a vector 
of labels which concatenated the browser names and percentage share values. We used 
the s pr i nt f ( ) function that returns a character vector containing a formatted combination 
of text and variable values. The first argument to s p r i n t f ( ) is the full character string in 
double quotes, where the % notation is used to fill in values dynamically and thus create 
a vector of strings for each slice. %s refers to a character string (b r o ws e r s [ , 1 ] which is 
the second argument). %3. 1 refers to a three digit value with one significant decimal place 
(the percentage share value calculated as the third argument). The second %s refers to the 
character" %" itself, which is the last argument. 


We make the pie chart using the same pi e( ) function call as in the last recipe, except that 
we set I a be I s to the newly constructed vector pi el a bel s. 
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There's more... 


We can adjust the size of the chart and the text labels by using the radius and c e x 
arguments respectively. 


See also 


In the next recipe we will see how to add a legend to a pie chart. 



legend to a pie chart 


Sometimes we may wish to use a legend to annotate a pie chart instead of using labels. 
In this recipe we will learn how to do that using the I e g e nd ( ) function. 


Getting ready 


Once again in this recipe, we will use the browsers, t xt example dataset, which contains 
data about the usage percentage share of different internet browsers. 



First we will load the b r o ws e r s. t xt dataset and then use the pi e( ) function to draw 
a pie chart: 

browsers<-read.tabl e (" br ows e r s. t xt 11 , header =T RUE) 
browsers<-browsers[order(browsers[, 2]),] 

pielabels 〈- s pr i nt f ( 11 %s = %3. 1 f %s 11 , br owser s [, 1], 

100* br owser s [, 2] / s um( br owser s[ , 2]), ■_ %") 

pi e(browsers!, 2], 

labels =NA, 

cl ockwi s e=TRUE ( 

col =br ewer. pal(7,"Set 1"), 

bor der =" whi t e 11 , 

radi us =0. 1 , 

cex=0. 8, 

main="Percentage Share of Internet Browser usage") 
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I egend(" bottomri ght", I egend=pi el abels, bty="n", 
fill =brewer. pal(7,"Set 1")) 


Percentage Share of Internet Browser usage 



pera = 2.3% 
Safari = 3.6% 
Chrome = 9.9% 
IE6= 11.0% 

IE7 = 12.9% 

□ IE8 = 13.6% 

画 Firefox = 46.7% 



Once again we ordered the browser dataset, created a vector of labels and made the pie 
chart with the pi e( ) function call, just like in the previous recipe. However, we set labels 
to N A this time as we want to create a legend instead of labeling the slices directly. 

We added a legend to the bottom-right corner by calling the I egend() function. We passed 
the pi el a bel s vector as the I egend argument and set the fill argument to the same 
RCo I or Brewer color palette we used for the pie slices. 


There's more... 


Depending on the number of slices and the desired size of the chart, we can experiment with 
placing the legend in different places. In this case, we have a lot of slice labels, otherwise we 
could place the legend in one single row on top of the chart by setting x to " t o p" and hor i z 
toTRUE. 
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Creating Histograms 


In this chapter, we will cover: 

► Visualizing distributions as count frequencies or probability densities 

► Setting bin size and number of breaks 

► Adjusting histogram styles: bar colors, borders, and axes 

► Overlaying density line over a histogram 

► Multiple histograms along the diagonal of a pairs plot 

► Histograms in the margins of line and scatter plots 


Introduction 


In this chapter, we will look in some detail at histograms, which are a very useful form of 
visualization to quickly see the distribution of values of a variable. They are usually one 
of the first graphs looked at to see whether a variable follows a normal distribution or 
has a skewed distribution. 

We will see how we can enhance the basic histogram in R by adjusting some parameters in 
the base graphics library. We will learn how to change certain settings to control the format 
in which the histogram is plotted (frequency or probability of values) and also how the values 
are grouped into bins. We will also look at the usual parameters for changing the styling 
of histogram bars, such as color, width, and border. In addition, we will also look at some 
advanced recipes combining histograms with other types of graphs. 


As with the previous chapters, it is best to try out each recipe first with the example shown 
here and then with your own datasets so that you can fully understand each line of code. 
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Visualizing distributions as count 
frequencies or probability densities 


Histograms can represent the distribution of values either as frequency (the absolute number 
of times values fall within specific ranges) or as probability density (the proportion of the values 
that fall within specific ranges). In this recipe, we will learn howto choose one or the other. 


Getting ready 


We are only using base graphics functions for this recipe. So, just open up the R prompt and 
type the following code. We will use the a i r pol I ut i on. c s v example dataset for this recipe. 
So let's first load it: 

air <-read.csv(" airpollution.csv") 



We will use the base graphics function hi s t () to make our histogram, first showing 
frequency and then probability density of Nitrogen Oxide concentrations: 

hi st (a i r $ Ni t r ogen. Oxi des, 

xlab=" Nitrogen Oxide Concentrations", 

ma i n =" Di st r i but i on of Nitrogen Oxide Concentrations") 


Distribution of Nitrogen Oxide Concentrations 
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Nitrogen Oxide Concentrations 








Now, let's make the same histogram but with probability instead of frequency: 
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Distribution of Nitrogen Oxide Concentrations 



100 200 300 400 500 600 


Nitrogen Oxide Concentrations 


hi st (ai r $Ni t rogen. Oxi aes, f r eq=FALSE, 
xlab=" Nitrogen Oxide Concentrations", 
mai n=" Di st r i but i on of Nitrogen Oxide Concentrations": 


The first example showing the frequency counts of different value ranges of Nitrogen Oxides 
simply uses a call to the h i s t ( ) function in the base graphics library. The variable is 
passed as the first argument and by default the histogram plotted shows frequency. In the 
second example, we passed an extra argument f r eq and set it to F A L S E, which results in a 
histogram showing probability densities. This suggests that by default f r eq is set to T R U E. 
The help section on h i s t ( ) (? h i s t) states that f r eq defaults to TRUE if and only breaks 
are equidistant and probability is not specified. 


An alternative to using the f r e q argument is the pr ob argument, which as the name 
suggests takes the opposite value to f r e q. So, by default, it is set to F A L S E and if we 
want to show probability densities then we need to set p r o b to T R U E. 
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Distribution of Nitrogen Oxide Concentrations 



100 200 300 400 500 600 


Nitrogen Oxide Concentrations 


As we saw in the previous recipe, the h i s t ( ) function automatically computes the number 
or breaks and size of bins in which to group the values of the variable. In this recipe, we will 
learn how we can control that and specify exactly how many bins we want or where to have 
breaks between bars. 


Once again, we will use the a i r pol I ut i on. c s v example dataset, so make sure you have 
loaded it: 


a i r <- read. csv( " a i rpol I ut i on. csv": 


First, let's see how to specify the number of breaks. Let's make 2 0 breaks in the Nitrogen 
Oxides histogram instead of the default 11 : 

hi st (a i r $ Ni t r ogen. Oxi des, 

breaks=2 0,xlab=" Nitrogen Oxide Concentrations", 

ma i n =" Di s t r i bu t i on of Nitrogen Oxide Concentrations") 


it... 


Getting ready 


size and number of breaks 


Setting 
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How it works... 


We used the breaks argument to specify the number of bars for the histogram. We set 
breaks to 20, however the graph shows more than 20 bars because R uses the value 
specified only as a suggestion and computes the best way to bin the data with breaks 
as close to the value specified as possible. 


There's more... 


We can also specify the exact values at which we want the breaks to occur. In this case, R 
does use the value we specify. Once again we use the breaks argument but this time we 
have to set it to a numerical vector containing the values at which we want the breaks. The 
breaks vector must cover the full range of values of the X variable. 

Let's say we want breaks at every 10 0 units of concentration: 

hi st (a i r $ Ni t r ogen. Oxi des, 

breaks=c(0,100, 200, 300,400, 500, 600), 

xlab=" Nitrogen Oxide Concentrations", 

mai n=" Di st r i but i on of Nitrogen Oxide Concentrations") 
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So, as you may have noticed, the breaks argument can take different types of values: a 
single value suggesting the number of breaks or a vector specifying exact bin breaks. In 
addition, breaks can also take a function which computes the number of bins. 

Finally, breaks can also take a character string as value naming an algorithm to calculate 
the number of bins. By default, it is set to " S t u r g e s 11 . Other names for which algorithms are 
supplied are 11 Sc ot t 11 and 11 F D" or 11 Fr eedman- Di a coni s". 


Adjusting histogram styles: bar colors, 
borders, and axes 


The default styling of histograms does not look great and may not be suitable for publications. 
In this recipe, we will learn how to improve the look by setting bar colors, borders, and 
adjusting the axes. 


Getting ready 


Once again we will use the a i rpol I ut i on. c s v example. So let's make sure it is loaded by 
running the following command at the R prompt: 

ai r <- r ead. cs v( 11 ai r pol I ut i on. cs v") 

Let's visualize the probability distribution of Respirable Particle Concentrations with black 
bars and white borders: 
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By now you may have guessed how to do that yourself. We used the c o I and border 
arguments to set the bar and border colors to black and white respectively. 


There's more... 


You may have noticed that in all of the previous examples the X axis is detached from the 
base of the bars. This gives the graphs a bit of an unclean look. Also notice that the Y axis 
labels are rotated vertically, which makes them harder to read. Let's improve the graph by 
fixing these two visual settings: 

par (yaxs= M i M as=l) 

hi st (a i r$ Res pi rabl e. Parti cl es, 

pr ob=T RUE, col ="bl ack 11 , border ="whi te", 

xlab=" Respirable Particle Concentrations", 

ma i n =" Di st r i but i on of Respirable Particle Concentrations") 

box( bty=" I 11 ) 

gr i d( nx = NA, ny =NULL, I t y =1, I wd =1, col = 11 gray") 



So we used a couple of extra function calls to change the look of the graph. First we called 
the par ( ) function and set y a x s to 11 i 11 so that the Y axis joins the X axis instead of having 
a detached X axis. We also set I as equal to 1 to make all the axis labels horizontal, thus 
making it easier to read the Yaxis labels. Then we ran the hi s t () function call as before and 
called box ( ) with type equal to " I 11 to make an L-shaped box running along the axes. Finally, 
we added horizontal grid lines using the g r i d( ) function. 
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Overlaying density line over a histogram 


In this recipe we will learn how to superimpose a kernel density line on top of a histogram. 


Getting ready 


We will continue using the a i rpol I ut i on. csv example dataset. You can simply type the 
recipe code at the R prompt. If you wish to use the code later, you should save it as a script 
file. First, let's load the data file: 

a i r <-r ea d. csv("ai rpolI ution. csv") 



Let's overlay a line showing the kernel density of Respirable Particle Concentrations on top of 
a probability distribution histogram: 

par (yaxs= n i M as=l) 

hist(air$Respirable.Particles, 

pr ob=T RUE, col ="bl ack 11 , border = n whi te", 

xlab=" Respirable Particle Concentrations", 

mai n=" Di st r i but i on of Respirable Particle Concentrations 1 ') 

box( bty=" I 11 ) 

I i nes (dens i t y( ai r $Respi r abl e. Parti cl es, na. rm=T) , col ="red", I wd=4) 
grid) nx=NA, ny =NULL, I t y =1, I wd=l, col = 11 gray") 
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The code for the histogram itself is exactly the same as in the previous recipe. After making 
the h i s t ( ) function call, we used the I i nes ( ) function to plot the density line on top. We 
passed the result of the density!) function call to the I i nes ( ) function. The default kernel 
used is gaussian, although other values can be specified. Please have a look at the help file 
for d e n s i t y ( ) for more details (run ? d e n s i t y at the R prompt). 


To make the line prominent, we set its type to solid (I t y =1), color to red (c o I =" red"), and 
width to 4 (I wd=4). 


Multiple histograms along the diagonal 
of a pairs plot 


In this recipe, we will look at some slightly advanced code to embed histograms inside another 
kind of graph. We learnt how to make pairs plots (a matrix of scatter plots) in Chapters 1 and 
Chapter 3. In those pairs plots, the diagonal cells running from the top-left to the bottom-right 
showed the names of the variables, while the other cells showed the relationship between 
any two pairs of variables. It would be useful if we could also see the probability distribution of 
each variable in the same plot. Here, we will learn how to do that by adding histograms inside 
the diagonal cells. 


Getting ready 


We will use the inbuilt iris flowers dataset of R. So we need not load any other datasets. We 
can simply type the given code at the R prompt. 



So let's make an enhanced pairs plot showing the relationship between different 
measurements of the iris flower species and how each measurement's values are 
spread across the range: 

panel . hi st <- f unct i on( x , …） 

{ 

par ( usr = c( par ( 11 u s r 11 ) [1:2], 0, 1.5)) 

hi st (x, prob=TRUE, add=TRUE, col ="bl ack 11 , border = n whi te 11 ) 
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pi ot (i r i s [ , 1 : 4], 

mai n=" Rel at i onshi ps between characteristics of iris flowers 
pch = 19, col =" bl ue", cex=0. 9, 
di ag. panel =panel .hist) 




We first defined the panel . hi s t () function which handles how the histograms are 
drawn. It is called by the pi ot ( ) function later when the argument diag.panel is set 
to panel ■ hi s t. 

The panel . hi st ( ) function only has two simple lines of code. First, we call the pa r () 
function to set the X and Y limits using the u s r argument. To reiterate what we learnt in Chapter 
2, the u s r arguments takes values in the form of a vector c (x mi n, x ma x , y mi n, y ma x ) giving 
the minimum and maximum values on the X and Y axes respectively. In the code, we keep the 
X axis limits the same as already set up by the pi ot ( ) function call. We need to change the Y 
axis limits for each diagonal cell because they are set by p I o t ( ) to be the same as the X axis 
limits. We need the Y axis limits in terms of the kernel density of each variable, so we set them 
to 0 and 1.5. 
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layout first 

x( c( 2, 0, 1, 3) , 2, 2, byrow=TRUE), wi dt hs =c(3,1), 
3), TRUE) 


#Make Sc at t er pi ot 

par ( mar =c( 5. 1, 4. 1, 0. 1, 0)) 

pi ot(ai r $Res pi rabl e. Parti cl es 〜 ai r $Nit r ogen. Oxi des, 

pch = 19, col =" bl ack", 

xl i m=c ( 0, 6 0 0 ), yl i m=c (0, 80), 

xl ab="Ni trogen Ox ides Concentrations", 

ylab=" Respirable Particle Concentrations 1 ') 

#P I ot hi st ogr am of X variable in the top row 

pa r ( ma r =c (0, 4. 1, 3, 0)) 

hi st(ai r $Nitr ogen. Oxi des, 

breaks=seq(0, 600,1 0 0 ), ann=FALSE, axes =FALSE, 

col ="bl ack", border ="whi te") 


Then we make the hi s t () function call with the style arguments of our choice and one key 
argument add (set to TRUE), which makes sure the histograms are added to the existing 
pairs plot and not drawn as new plots. Any panel function should not start a new plot or it will 
terminate the pairs plot. So, we can't use the h i s t ( ) function without setting add to TRUE. 


Histograms in the margins of line 
and scatter plots 


In this recipe, we will learn how to draw histograms in the top and right margins of a bivariate 
scatter plot. 


Getting ready 


We will use the ai r pol I ut i on. cs v example dataset for this recipe. So, let's make sure 
it is loaded: 

ai r <- r ead. cs v( 11 ai r pol I ut i on. cs v") 


How to do it... 


Let's make a scatter plot showing the relationship between Concentrations of Respirable 
Particles and Nitrogen Oxides with histograms of both the variables in the margins: 
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5. 1, 0, 0. 1, 1)) 
i st $densit y, 

,s pace=0,axes =FALSE, 
11 , border =" whi t e") 


#P I ot hi st ogr am of Y variable to the right of the scatterplot 
yhist <- hist(air$Respirable.Particles, 
breaks=seq(0, 80, 10), pi ot =FALSE) 



The given example is a bit more complex than the recipes we have seen so far. However, if 
we look at each line of code one-by-one we can understand it quite easily. 

First we used the I ay out ( ) function to divide the graph into separate regions for the scatter 
plot and the two histograms. We could also use the pa r ( ) function with the mf r o w argument 
instead, but I a y o ut () gives us finer control over the height and width of each cell of the 
graph. When we use par ( ) with mf r o w or mf c o I to create a matrix layout, all cells are 
automatically created of equal heights and widths. 
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The first argument to the layout)) function is a matrix specifying the number of rows and 
columns the graphics device should be divided into and the location of each figure. Run just 
the matrix command from the code at the R prompt to see the resultant matrix: 

matrix(c(2,0,l,3),2,2,byr ow=TRUE) 

[, 1 ] [, 2 ] 

[ 1 ,] 2 0 

[2, ] 1 3 

The matrix values shown here mean that the first figure should be drawn in the second row and 
first column (scatter plot), the second figure in the first row and first column (histogram of X 
variable), and the third figure in the second row and second column (histogram of Y variable). 

The other arguments to I a y o u t ( ) are widths and hei ght s which specify the widths and 
heights of the columns and rows respectively as a vector. The last argument is set to T R U E 
so that a unit column-width is the same physical measurement on the device as a unit 
row-height. 

We have chosen this particular layout so that the scatter plot occupies most of the area 
of the graph and the histograms are plotted in a smaller area as they are only giving 
supplementary information. 

Once the layout is created, we draw the plots one by one in the order that we set up the layout 
matrix. So, first we made the scatter plot giving specific X and Y axis limits, so that we can use 
the same limits to plot the histograms with the correct breaks. 

Then we made the histogram of Nitrogen Oxides in the top margin just above the scatter plot. 

We first used the pa r () function with the ma r argument to set the margins so as not to leave 
any margin at the bottom and matching the margins on the left and right to those of the scatter 
plot. We specified the breaks exactly as a vector of values between the X and Y limits of the 
scatter plot by using the s e q ( ) function. The axes and annotations are suppressed by setting 
the axes and a nn arguments to F A L S E, thus giving the histogram a clean minimal look. 

Next, we added the rotated histogram of Respirable Particle Concentrations to the right of 
the scatter plot. We had to do this differently from the first histogram because the h i s t () 
function does not have an inbuilt way to draw the bars horizontally. As we have seen in 
earlier chapters, the b a r pI ot () function does have such a capability. So, we first created 
a histogram object but suppressed its plotting by setting the plot to F A L S E. Then we passed 
the density values from that object to the bar pi ot ( ) function to plot them horizontally by 
setting the hor i z argument to T R U E. Just like the X axis histogram, we set the breaks of 
the Y histogram equal to a sequence matching the Y limits of the scatter plot. Then we set 
the margins so that the bottom and top margins match those of the scatter plot and the left 
margin is zero. Then we called the bar pi ot ( ) function to draw the horizontal bars. Note that 
we set the space argument equal to zero, otherwise the bars are drawn with gaps between 
them by default. 
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Creating Box and 
Whisker Plots 


In this chapter, we will cover: 

► Creating box plots with narrow boxes for a small number of variables 

► Grouping over a variable 

► Varying box widths by number of observations 

► Creating box plots with notches 

► Including or excluding outliers 

► Creating horizontal box plots 

► Changing box styling 

► Adjusting the extent of plot whiskers outside the box 

► Showing the number of observations 

► Splitting a variable at arbitrary values into subsets 


Introduction 


In this chapter, we will look in some depth at box and whisker plots, which are a great form of 
visualization to summarize large amounts of data by showing Tukey's five-number summary: 
minimum, lower-hinge, median, upper-hinge, and maximum. Box plots are a good way to spot 
outliers and compare the key statistics for different variables or groups. 

We will learn various stylistic and structural variations on how to adjust box plots in R (using 
the basic boxpl ot ( ) command). In addition to changing the look of our box plots, we 
will also learn how to add additional useful information to them. We will start by looking at 
some basic arguments to change individual aspects of a box plot and slowly move to more 
advanced recipes involving the use of multiple function calls and arguments to create more 
complex types of box plots. 





Creating Box and Whisker Plots 


As with the previous chapters, it is best to try out each recipe first with the example shown 
here and then with your own datasets so that you can fully understand each line of code. 


Creating box plots with narrow boxes 
for a small number of variables 


R automatically adjusts the widths of boxes in a box plot according to the number of variables. 
This works fine when we have a relatively large number of variables (more than four), but you 
may find that for a small number of variables the default boxes are too wide. In this recipe, we 
will learn how to make the boxes narrower. 


Getting ready 


We are only using the base graphics functions for this recipe. So, just open up the R prompt 
and type the following code. We will use the a i r po I I ut i on. c s v example dataset for this 
recipe. So let's first load it: 

air <-read.csv(" airpollution.csv") 



We want to make a box plot summarizing the two columns in our dataset: Respirable 
Particles and Nitrogen Oxides. If we simply use the box pi ot command we get a box 
plot with very wide boxes: 


boxpl ot (ai r, I as =1) 









Let's improve the look of the graph by making the boxes narrower: 
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boxpl ot (ai r, boxwex =0. 2, I as =1) 
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So we changed the width of the boxes by passing the boxwex argument to the boxpl ot () 
command. We set boxwex to a value of 0. 2. The value depends on the number of variables 
we are plotting, but it should usually be less than 1. 

Note that we also passed the I a s argument with a value of 1 to make the Y axis labels 
horizontal. By default, they are parallel to the Y axis, thus making them difficult to read. Since 
we want this setting in all our graphs, we can set it globally by calling the pa r ( ) function: 

par (I as =1) 



Note that we must not close the graphics device if we want to retain the 
setting. If we do close the device, we will need to set I as to 1 again either 
using the p a r ( ) function call or within each boxpl ot ( ) function call. 
From now on, it is assumed that we will set I a s to 1 globally. 
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There's more... 


Note that when we specify a width using box wex the same value is applied to all the boxes in 
the plot. There is another argument, wi dt h, which can be used to set the relative widths of 
boxes. The width argument takes values in the form of a vector containing a value for each 
box. For example, if we wanted the box for Respirable Particles twice as wide as Nitrogen 
Oxides, we would run: 

boxpl ot (ai r, wi dt h=c( 1, 2)) 


See also 


Setting arbitrarily different widths for boxes using the width argument is not a good idea, 
unless the difference in widths conveys another important fact about the data. We will see 
one such example later in the chapter. 


Grouping over a variable 


In this recipe we will see how we can summarize data for a variable with respect to another 
variable in the dataset. We will learn to group over a variable such that a separate box plot is 
created for each group. 


Getting ready 


We are only using the base graphics functions for this recipe. So, just open up the R prompt 
and type the following code. We will use the met a I s. cs v example dataset for this recipe. So 
let's first load it: 

met a I s <- read, c s v (■_ me t a I s. c s v 11 ) 


How to do it... 


Let's make a box plot showing copper (Cu) concentrations grouped over measurement sites: 
boxpl ot(Cu 〜 Source, dat a =metal s, 

mai n=" Summar y of Copper ( Cu) concent rations by Site") 
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Summary of Copper concentrations by Site 




The previous box plot works by using the formula notation y 〜 g r o u p ， where y is the variable 
whose values are depicted as separated box plots for each value of g r o u p. 


There's more... 


Grouping over a variable works well only when the group variable has a limited number of 
values, such as when it is a category (or factor in terms of an R data type) such as So u r c e 
in this example. Grouping over another numerical variable with lots of unique values (say 
Manganese (Mn) concentrations) would result in a graph with too many box plots and not tell 
us much about the data. 

We can also group over more than one category. If we wanted to group over the Source and 
another variable Expt, the experiment number, we could run: 

boxpl ot ( Cu-Source*Expt ( dat a=met al s, 

mai n=" Summar y of Copper (Cu) concentrations by Site") 
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See also 


We will use grouped box plots as examples in the next few recipes. 


Varying box widths by number 
of observations 


In this recipe, we will learn how to vary box widths in proportion to the number of observations 
for each variable. 


Getting ready 


Just like the previous recipe，we will continue to use the met a I s. c s v example dataset for 
this recipe. So let's first load it: 


met a I s <- read, c s v (■_ me t a I s. c s v 11 ) 


Let's build a box plot with boxes of width proportional to the number of observations 
in the dataset: 


boxplot(Cu 〜 Source, data = met a I s, va r wi dt h =TRUE ( 
ma i n =" Su mma r y of Copper concentrations by Site") 


Summary of Copper concentrations by Site 
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How it works... 


In the example, we set the va r wi dt h argument to T R U E, which makes the width of the boxes 
proportional to the square roots of the number of observations in the groups. 

We can see that the box for Site4 is the narrowest, since it has the least number of 
observations in the dataset. Differences in the other boxes' widths may not be so obvious, 
but this setting is useful when we are dealing with larger datasets. By default, v a r wi dt h is 
set to F A L S E. 


Creating box plots with notches 


In this recipe, we will learn how to make box plots with notches, which are useful in comparing 
the medians of different groups. 


Getting ready 


We will continue to use the met a I s. c s v example dataset for this recipe. So let's first load it: 
met a I s <- read, c s v (■_ me t a I s. c s v 11 ) 


How to do it... 


We shall now see how to make a box plot with notches: 

boxplot(Cu 〜 Source, data = metals, 
va r wi dt h=TRUE ( notch =T RUE, 

mai n=" Summar y of Copper concentrations by Site") 



Site 4 


|CT^| 








Creating Box and Whisker Plots 



In the example, we set the notch argument to TRUE to create notches on each side of the 
boxes. If the notches of two plots do not overlap, then the medians are significantly different 
at the 5% level, which suggests that the median concentrations at the four sites as shown are 
not statistically different from each other. 


There's more... 


We can set the notch, f r ac argument to a value between 0 and 1 to adjust the fraction of 
the box width that the notches should use. The default value is 0. 5 and a value of 1 gives 
notches using the entire width of the box, effectively producing a box plot without notches. 


Including or excluding outliers 


In this recipe, we will learn how to remove outliers from a box plot. This is usually not a 
good idea because highlighting outliers is one of the benefits of using box plots. However, 
sometimes extreme outliers can distort the scale and obscure the other aspects of a box plot, 
so it is helpful to exclude them in those cases. 


Getting ready 


Let's continue using the me t a I s. c s v example dataset. So let's first make sure it's loaded: 
met a I s <- r ea d. cs v(" met a I s. c s v") 



Once again, we will use the base graphics boxpl ot () function with a specific argument to 
make our metal concentrations box plot without outliers: 

boxpl o t ( me t a I s [ , - 1] , out I i ne=FALSE ( 

mai n=" Summary of met a I concent rations by Site \ n 

(wi t hout outliers) 11 ) 


-[ill} 










We used the outline argument in the boxpl ot () function call to suppress the drawing of 
outliers. By default, outline is set to T R U E. To exclude outliers, we set it to F A L S E. 


See also 


In the recipe Adjusting the extent of plot whiskers outside the box, later in the chapter, we will 
learn how to extend the whiskers of a box plot, which is another way of eliminating outliers by 
changing the definition of the cut-off value for an outlier. 


Creating horizontal box plots 


In this recipe, we will see how to make box plots with horizontal boxes instead of the default 
vertical ones. 


Getting ready 


We will continue using the base graphics library functions, so we need not load any additional 
package. We just need to run the recipe code at the R prompt. We can also save the code as a 
script to use it later. Here, we will use the met a I s. c s v example dataset again: 

met a I s <- read, c s v (■_ me t a I s. c s v 11 ) 
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Let's draw the metals concentration box plot with horizontal bars: 

boxplotfmetals [ ， - 1], 
ho r i z o nt a I =T RUE, I as =1, 

ma i n =" Su mma r y of met a I concentrations by Site") 


Summary of metal concentrations by Site 




We simply had to set the hor i z ont a I argument in the boxpl ot ( ) command to TRUE to 
make the boxes horizontal. By default, it is set to F A L S E. 


I Note that unlike bar pi o t s, the argument name is 

horizontal and not just hor i z. 
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Changing box 



So far, we have used the default styling for our box plots. In this recipe, we will learn how to 
change the colors, widths, and styles of various elements of a box plot. 


Getting ready 


We will continue using the base graphics library functions, so we need not load any additional 
library or package. We just need to run the recipe code at the R prompt. We can also save the 
code as a script to use it later. Here, we will use the met a I s. c s v example dataset again: 

met a I s <- read, c s v (" me t a I s. c s v 11 ) 



We can build a box plot with custom colors, widths, and styles in the following way: 
boxplotfmetals [ ， - 1], 

border = 11 whi t e", col = 11 bl ac k", boxwex = 0.3, 
medl wd =1, whi s kc ol =" bl ac k", s t a pi ec oI =" bI ac k 11 , 
out col ="r ed" ( cex=0_ 3,out pc h = 19, 
ma i n =" Su mma r y of met a I concentrations by Site") 

gr i d( nx=NA, ny=NULL, col ="gr ay",It y= n dashed") 


Summary of metal concentrations by Site 
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How it works... 


We have used a few different arguments in the example to change the styling of the box plot. 
The first two are c ol and border, which set the box color and border color respectively. Note 
that the border argument also sets the color for the median line, unless it is specified using 
the medc o I argument. 

In the example, in addition to using boxwex for adjusting box widths, we used med I wd to set 
the width of the median line. We set the color of the whiskers and staple using wh i s k c o I 
and st apl ec o I respectively. The color and symbol type of the outlier points were set using 
out c ol and out pc h respectively. The size of the points was set using the c e x argument. 


There's more... 


We can set the color, size, and styling for each of the components. If you type ? b x p at the R 
prompt, you can see the help section for the bxp( ) function which is called by b o x p I o t ( ) to 
do the actual drawing. The following is a summary: 


Argument to boxplot() 

Corresponding setting 

boxl t y,boxl wd,boxcol,boxfi11 

Box outline type, width, color, and fill color 

med 1 t y, med 1 wd, medpch, medc ex, 
medcol ， me dbg 

Median line type, line width, point character, point 
size expansion, color, and background color 

whi ski t y, whi ski wd, wh i skcol 

Whisker line type, width, and color 

st apl el t y，st apl el wd，st apl ewex ， 
st apl ecol 

Staple line type, width, line width expansion, and 
color 

out 1 t y ， out 1 wd ， out wex ， out pc h ， 
outcex,outcol,out bg 

Outlier line type, line width, line width expansion, 
point character, point size expansion, color, and 
background color 


Adjusting the extent of plot whiskers 
outside the box 


Sometimes, we may wish to change the definition of outliers in our dataset by changing the 
extent of the whiskers. In this recipe, we will learn how to adjust the extent of whiskers in a 
box plot by passing a simple argument. 









Getting ready 
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We will continue using the base graphics library functions, so we need not load any additional 
library or package. We just need to run the recipe code at the R prompt. We can also save the 
code as a script to use it later. Here, we will use the met a I s. c s v example dataset again: 

met a I s <- read, c s v (■_ me t a I s. c s v 11 ) 


do it... 


Let's draw the metal concentrations box plot with the whiskers closer to the box than the 
default one in the last recipe: 

boxplot(metals [ ， - 1], 

range = l, border = "white",col = "black", 

boxwex = 0. 3, medl wd=l, whis kcol ="bl ack", 

st a pi ecol ="bl ack", out c ol ="r ed",cex=0. 3, out pc h =19, 

ma i n =" Su mma r y of met a I concentrations by Site \ n 

(range=l) 11 ) 



Summary of metal concentrations by Site 
(range=1) 
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Creating Box and Whisker Plots 


How it works... 


We passed the range argument with a value of 1 to the boxpl ot () function in order to 
reduce the extent of the whiskers. The default value of r a n g e is 1 ■ 5—it only takes positive 
values. The whiskers extend to the most extreme data point which is no more than range 
times the interquartile range from the box. 


There's more... 


If we want to extend the whiskers to the data extremes, we can either set range to a high 
enough value, such that range times the interquartile range from the box is more than the 
most extreme data point. Alternatively, we can simply set range to zero: 

boxplotfmetals [ ， - 1], 

range=0, border = "whit e",c ol = "black", 

boxwex = 0. 3, medl wd=l, whis kcol ="bl ack", 

st a pi ecol ="black", out c ol ="r ed",c ex=0. 3, out pc h =19, 

ma i n =" Su mma r y of met a I concentrations by Site \ n (range=0)") 



Showing the number of observations 


It is often useful to know the number of observations for each variable or group when 
comparing them on a box plot. We did this earlier with the v a r wi dt h argument which makes 
the widths of boxes proportional to the square root of the number of observations. In this 
recipe, we will learn how to display the number of observations on a box plot. 
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Getting ready 


We will continue using the base graphics library functions, so we need not load any additional 
library or package. We just need to run the recipe code at the R prompt. We can also save the 
code as a script to use it later. Here, we will use the met a I s. c s v example dataset again: 

met a I s <- read, c s v (" me t a I s. c s v 11 ) 


do it... 


Once again, let's use the metal concentrations box plot and display the number of 
observations for each metal below its label on the X axis: 

b<- box pi ot ( met a I s [，- 1 ], 

xaxt="n", border = 11 whi te" ( col = "bl ack", 

boxwex = 0. 3, medl wd=l, whis kcol ="bl ack", 

st apl ecol ="black", out col ="r ed",cex=0. 3, out pch=19, 

ma i n =" Su mma r y of met a I concentrations by Site") 

ax i s(si de = l, at =1:1 engt h( b $ n a me s), 

I a be I s =pas t e (b$na mes, " \ n (n =", b$n, ")", s e p = " 11 ), 
mgp=c( 3, 2, 0)) 


Summary of metal concentrations by Site 
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Creating Box and Whisker Plots 



In the example, we first made the same stylized box plot as we did two recipes ago, but we 
suppressed drawing the default X axis by setting xaxt to " n 11 . We then used the a x i s () 
command to create our custom axis with the metal names and number of observations 
as labels. 

We set s i d e to 1 to denote the X axis. Note that we saved the object returned by the 
boxpl ot () function as b, which is a list containing useful information about the box plot. 

You can test this by typing b at the R prompt and hitting Enter (after you've run the boxpl ot 
command). We combined the n a me s and n (number of observations) components of b using 
paste)) to construct the labels argument. The at argument was set to integer values 
starting from 1 to the number of metals. Finally, we also used the mg p argument to set the 
margin line for the axis labels to 2, instead of the default 1, so that the extra line with number 
of observations doesn't make the labels overlap with the tick marks (you can see this if you 
omit mgp). 


There's more... 


Another way of displaying the number of observations on a box plot is to use the 
boxpl ot. n( ) function from the gpl ot s package. First let's make sure the 
gpl ot s package is installed and loaded: 

i n s t a I I . p a c k a g e s ( 11 g p I o t s 11 ) 

I i br ar y(gpl ot s) 


boxplot.n( me tals [ ， - 1], 

border = "white",col = " bl ac k", boxwex = 0.3, 
medl wd =1, whi skcol =" bl ack", st apl ecol =" bl ack 11 , 
out col =" red", cex=0. 3,out pch = 19, 
ma i n =" Su mma r y of met a I concentrations by Site") 
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Summary of metal concentrations by Site 
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The problem with using this function is that the number labels are cut off by the axis. One way 
to get around this problem is to place the labels at the top of the plot region by setting the t o p 
argument to TRUE in the box pi ot. n ( ) function call. 


Splitting a variable at arbitrary values 
into subsets 


In this recipe, we will learn how to split a variable at arbitrary intervals of our choice to 
compare the box plots of values within each interval. 


Getting ready 


We will continue using the base graphics library functions, so we need not load any additional 
library or package. We just need to run the recipe code at the R prompt. We can also save the 
code as a script to use it later. Here, we will use the me t a I s. c s v example dataset again: 

met a I s <- read, c s v (" me t a I s . c s v 11 ) 


150 


100 


50 


n=?1? 

















Creating Box and Whisker Plots 



Let's make a box plot of copper (Cu) concentrations split at values 0,40 and 80: 


cut s <- c( 0, 40, 8 0 ) 

Y<- s pi i t (x =met a I s $ Cu, f=findlnterval(metals$Cu, cuts)) 
boxplot(Y, xaxt ="n", 

border = "white",col = " bl ac k", boxwex = 0.3, 
medl wd =1, whi skcol =" bl ack", st apl ecol =" bl ac k 11 , 
out col =" r ed", cex=0. 3,out pch = 19, 
ma i n =" Su mma r y of Copper concentrations", 
xlab="Concentration ranges",Ias=l) 


axi s(1, at=1:1 ength(cl abel s), 

I abel s=c(" Bel ow 0", 11 0 to 40","40 to 80", "Above 80"), 
I wd =0, I wd. t i c ks =1, coI =" g r ay 11 ) 

300 - 
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Below 0 0 to 40 40 to 80 Above 80 
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Summary of Copper concentrations 
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cl abel s<- c( cl abel s, 

paste("Above", as.c ha r a ct er(c ut s[I engt h(cut s)]))) 

axi s(1, at=1:1 ength(cl abel s), 

Iabels =cI abels,I wd=0,I wd.tic ks =1, col ="gr ay") 



We used a combination of a few different R functions to create the example graph shown. 
First, we defined a vector called cuts with values at which we wanted to cut our vector of 
concentrations. Then we used the split)) function to split the copper concentrations vector 
into a list of concentration vectors at specified intervals (you can verify this by typing Y at the 
R prompt and hitting Enter). Note that we used the f i ndl nt er va I () function to create 
a vector of labels (factors) corresponding to the interval each value in met a I s $ Cu lies in, 
and set the f argument of the split)) function. Then we used the boxpl ot () function to 
create the basic box plot with the new Y vector and suppressed the default X axis. We then 
used the a x i s ( ) function to draw the X axis with our custom labels. 


There's more... 


Let's turn the previous example into a function to which we can simply pass a variable and the 
intervals at which we wish to cut it, and it will draw the box plot accordingly: 

boxpl ot. cuts<-functi on(y, cuts, . . . ) { 

Y<- spl i t ( met al s$Cu, f=fi ndl nterval (y, cuts)) 
b<- boxpl ot(Y,xaxt ="n", 

border = "white",col = 11 bl ac k", boxwex = 0.3, 
medl wd =1, whi skcol =" bl ack", st apl ecol =" bl ac k 11 , 
out col ="red",cex=0. 3,out pch = 19, 
ma i n =" Su mma r y of Copper concentrations", 
xlab=" Concentration ranges",las=l , …） 


c I a bel s <- pas t e ( 11 Be I o w", c ut s [ 1]) 
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Creating Box and Whisker Plots - 

... is used to symbolize extra arguments to be added 
I if required. 

Now that we have defined the function, we can simply call it as follows: 

boxpl ot. c u t s ( me t a I s$Cu,c( 0, 3 0, 6 0 )) 

Another way to plot a subset of data in a box plot is by using the subset argument. For 
example, if we want to plot copper concentrations grouped by s o u r c e above a certain 
threshold value (say 4 0 ), we can use: 

boxpl ot ( Cu-Source, data =met a I s, subset=Cu>4 0) 

Note that we included an extra argument … to the definition of boxpl ot.cuts() in 
addition to y and cuts. This allows us to pass in any extra arguments which we don't 
explicitly use in the call to b o x p I o t ( ) inside the definition of our function. For example, 
if we can pass y I a b as an argument to boxpl ot. cuts)) even though it is not explicitly 
defined as an argument. 

If you find this example too cumbersome (especially with the labels), following is an 
alternative definition ofboxplot.cutsf) which uses the c ut ( ) function and its 
automatic label creation: 

boxpl ot. cut s <-f uncti on(y,c ut s) { 

f=cut(y, c(min(y[!is.na(y)]),cuts,max(y[!is.na(y)])), 
or der ed_ r es uIt s =TRUE); 

Y<- spl i t (y, f =f) 

b<- boxpl ot(Y ( xaxt ="n M , 

border = "white",col = ■_ bl ac k", boxwex = 0.3, 
medl wd =1, whi skcol =" bl ack", st apl ecol =" bl ack 11 , 
out col ="r ed",cex=0. 3,out pc h = 19, 
ma i n =" Su mma r y of Copper concentrations", 
xlab="Concentration ranges",I as =1) 
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To create a box plot similar to the example shown earlier, we can run: 
boxpl ot. c u t s ( me t a I s$Cu, c( 0, 4 0, 8 0 )) 
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Creating Heat Maps 
and Contour Plots 


In this chapter, we will cover: 

► Creating heat maps of single Z variable with scale 

► Creating correlation heat maps 

► Summarizing multivariate data in a heat map 

► Creating contour plots 

► Creating filled contour plots 

► Creating three-dimensional surface plots 

► Visualizing time series as calendar heat maps 


Introduction 


In this chapter, we will learn how to make various types of heat maps and contour plots. 

By heat maps, we mean color coded grid images, useful for visualizing correlations, trends 
and multivariate data. We will see how contour plots can be used to show topographical 
information in various two-dimensional and three-dimensional ways. 

The recipes in this chapter are a bit longer and more advanced than the ones in previous 
chapters. However, the code is clearly explained step by step, so that you can understand 
how it works. 

As with the previous chapters, it is best to try out each recipe first with the example shown 
here and then with your own datasets so that you can fully understand each line of code. 




Creating Heat Maps and Contour Plots 


Creating heat maps of single Z variable 
with scale 


In this recipe we will learn how to make a heat map showing the variation in values of one 
variable (z) along the X and Y axes as a grid of colors, and display a scale alongside. 


Getting ready 


We are only using the base graphics functions for this recipe. So, just open up the R prompt 
and type the following code. We will use the sal es, c s v example dataset for this recipe. So 
let's first load it: 

s a I es <- r ea d. c s v(" s a I es. c s v 11 ) 

We will use the RCo I or Br ewer package for some good color palettes. So let's make sure it's 
installed and loaded: 

i nstal I ■ packages!"RCol or Br ewer") 

I i br ar y( RCol or Brewer) 


How to do it... 


The sales dataset has monthly sales data for four cities. Let's make a heat map with the 
months along the X axis and the cities on the Y axis: 

r ownames (sal es) <- sal es[, 1] 

sal es<- sal es [ ， - 1] 

dat a_mat r i x<- dat a. mat r i x (s a I es) 

pal =brewer. pal (7, 11 Yl Or Rd") 

breaks<-seq(3000,1 2 0 00, 1 500) 

#C reate layout with 1 row and 2 columns (for the heat map and scale); 
the heat map column is 8 times as wide as the scale column 

I ayout ( mat r i x( dat a =c (1, 2), nr ow=l ( nc ol =2), wi dt hs =c (8, 1), 
hei ght s=c( 1, 1)) 

#Set margins for the heat map 

par ( mar = c( 5, 10, 4 ( 2), oma =c( 0. 2 ( 0. 2, 0. 2 ( 0. 2), mex =0. 5) 


i mage( x = l: nr ow( dat a_mat r i x), y =1: ncol (data_matri x), 







error while running the above code, 
ust the margins so that the graph and 


I mage( x=l, y=0: I engt h( br eaks2) , z=t ( mat r i x( breaks2)) *1. 001, 
col =pal [ 1: I engt h( breaks) - 1], axes =FALSE, breaks=breaks, 
xl ab=" 11 , yl ab = "", xaxt =" n") 


axis(4,at=0: (I ength(breaks2)-l), I abel s=breaks2 ( col =" whi t e", 
I as =1) 

abl i ne(h =c (1 : I engt h( br ea ks 2) ), coI =" whi t e", I wd =2, xpd =F) 
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z =dat a_mat r i x, axes =FALSE, xl ab = "Mont h", 
y I ab="", col =paI[1:(1 engt h(breaks)-1)], 
br ea ks =br ea ks, ma i n =" Sa I es Heat Map") 

axi s( 1, at =1: nr ow( dat a_mat r i x), I abel s =r o wn a mes (d at a_ ma t r i x), 
coI =" whi t e", I as =1) 

axi s (2, at =1: ncol (d a t a _ ma t r i x), I abel s =col n a me s (d a t a _ ma t r i x), 
col =" whi t e", I as =1) 

abl i ne (h =c (1: ncol (data_matrix) )+0.5, 

v=c( 1: nrow(data_matri x)) +0. 5, col =" whi t e", I wd=2 ( xpd=FALSE) 
breaks2<-breaks[-l engt h(br ea ks)] 
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Creating Heat Maps and Contour Plots 



We used a lot of steps and different function calls to create the heat map. Let's go through 
them one by one to understand how it all works. 

Basically, we used the i ma g e ( ) function in the base graphics library to create the heat map 
and its color scale. There is also a h e a t ma p () function and a h e a t ma p. 2() function in the 
gpl ot s package. However, we used i ma ge ( ) because it is more flexible for our purpose. 

First, we had to format the data in the correct format for i ma g e ( ) , which requires that 
the z parameter be in the form of a matrix. The first column of the sales dataset contains 
the month names, which we assigned as the r o wn a me s. Then we removed the month 
column from the dataset and cast it as a matrix called d a t a _ ma t r i x, containing only 
numerical values. 

We defined breaks as a sequence of values from 3 0 0 0 up to 1 2 0 00 with steps of 1 5 0 0. 
These values are used to map the sales values to the color scale, where each color denotes 
values within a certain range. We used the RCol or Br ewer palette YI Or Rd which contains 
seven warm colors. 

We created a graph layout with one row and two columns using the layout)) function. The 
left column for the heat map is eight times as wide as the right column for the color scale and 
their heights are equal. 

We used the i ma g e ( ) function to create the heat map. The main argument is z which we 
set to d a t a _ ma t r i x. The x and y arguments take the index of the rows and columns of the 
matrix respectively. We set the breaks argument to the breaks vector we created earlier and 
set the c o I argument to our palette, but with the number of colors one less than the number 
of breaks. This is a requirement of the i ma g e () function. 

Note that we suppressed the drawing of the default axes. We used the a x i s ( ) command to 
draw the X and Y axes with row and column names respectively as the labels. The a b I i n e () 
function call is used to draw the white lines separating each block of color on the heat map (a 
bit like gridlines). These lines make the graph look nicer and a bit easier to read. 

Finally, we drew the color scale by issuing another i ma g e ( ) function call. We first created 
a subset of b r e a k s, called br eaks 2, without the last element of b r e a k s. We passed 
a transpose of a matrix of b r e a k s 2 as the z argument to i ma g e () . Note that we also 
multiplied it by 1. 001 ， to create a set of values just above each break so that they are colored 
appropriately. We used the same breaks and col arguments as the heat map. We added a 
Y axis on side 4 to mark the break values and also used a bl i ne( ) to draw white horizontal 
lines to separate the breaks. 
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There's more... 


The preceding code may seem a bit too complicated at first, but if you go through each 
statement and function call carefully, you will notice that it is just a big block of code with the 
same building blocks that we have used in earlier recipes. The best way to really understand 
the recipe and to modify it for your own needs is to change, add, or remove arguments from 
each function call and see the resulting effects. 


See also 


In the next few recipes, we will continue using the i ma g e ( ) function to make some more 
types of heat maps. 


Creating correlation heat maps 


In this recipe, we will learn how to make a correlation heat map from a matrix of 
correlation coefficients. 


Getting ready 


We are only using the base graphics functions for this recipe. So, just open up the R prompt 
and type the following code. We will use the genes, c s v example dataset for this recipe. So 
let's first load it: 

genes <- r ead. cs v(" genes. c s v 11 ) 


How to do it... 


Let's make a heat map showing the correlation between genes in a matrix: 

r ownames (genes )<- genes [, 1] 

dat a_mat r i x<- dat a. mat r i x( genes [, -1]) 

pa I =heat.c ol or s(5) 

breaks <-seq(0, 1, 0. 2) 

I ayout ( mat r i x( dat a =c (1, 2), nr ow=l ( nc ol =2), wi dt hs =c (8, 1) 
hei ght s=c( 1, 1)) 

par ( mar = c( 3, 7 ( 12 ( 2), oma =c( 0. 2, 0. 2, 0. 2 ( 0. 2), mex =0. 5) 

i mage( x=l: nr ow( dat a_mat r i x), y =1: ncol (data_matrix), 
z =dat a_mat r i x, xl ab="", yl a b ="", breaks =br eaks, 
col =pal, axes=FALSE) 
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axi s ( 4, at =0: (I ength(breaks2)) ( label s =br ea ks,col = white,I as =1) 
a b I i ne (h =c (1 : I ength(breaks2)),col =" whi t e", I wd =2, x pd =F) 


Correlation between genes 
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t ext (x =1: nr ow( dat a_mat r I x) +0. 7 5, y=par ( " usr 11 ) [ 4] + 1.25, 
srt = 45, adj = 1, labels = r ownames) data matrix), 
xpd = TRUE) - 

axi s(2, at =1: ncol (d a t a _ ma t r i x), I abel s =c o I n a me s (d a t a _ ma t r i x), 
col =" whi t e", I as =1) 

abl i ne (h =c (1: ncol (data_matrix) )+0.5, v =c (1: nrow(data_matri x) )+0.5, 
col ="whi t e",Iwd=2 ( xpd=F) 

title! 11 Correlation bet ween genes 11 , I i ne=8, adj =0) 
breaks2<-breaks[-l ength(breaks)] 
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Just like in the previous recipe, first we format the data using the first column values as row 
names and cast the dataframe as a matrix. We created a palette of five colors using the 
heat, colors)) function and defined a sequence of breaks 0, 0.2, 0.4,...1.0. 

Then we created a layout with one row and two columns (one for the heat map and the other 
for the color scale). We created the heat map using the i ma g e ( ) command in a similar way 
to the previous recipe passing the data matrix as the value of the z argument. 

We added custom X axis labels using the t ext ( ) function, instead of the ax i s () function 
to rotate the axis labels. We also placed the labels in the top margin instead of the bottom 
margin as usual to improve the readability of the graph. This way it resembles a gene 
correlation matrix of numbers more closely, where the names of the genes are shown on the 
top and left. To create the rotated labels, we set the s r t argument to 45, thus setting the 
angle of rotation to 45 degrees. 

Finally, we added a color scale to the right of the heat map. 


There's more... 


We can use a more contrasting color scale to differentiate between the correlation values. For 
example, to highlight the diagonal values of 1 more clearly, we can substitute the last color in 
our palette with white. 



If you get a figure margins error while running the code, enlarge the plot 
device or adjust the margins so that the graph and scale fit within the device. 



Summarizing multivariate data in a heat map 


In the preceding couple of recipes, we have looked at representing a matrix of data along two 
axes on a heat map. In this recipe, we will learn how to summarize multivariate data using a 
heat map. 


Getting ready 


We are only using the base graphics functions for this recipe. So, just open up the R prompt 
and type the following code. We will use the nba. c s v example dataset for this recipe. So let's 
first load it: 

nba <- read. csv("nba.csv") 
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xi s I a 
(1: n r o 
= 45, 

=TRUE 


#X a 
text 
sr t 
xpd 


par("usr")[4] 
=s t a t n a me s, 


nt s 


s", 


t at na mes <- c (" Ga mes Played", "Minutes Played", "Total P o i 
Field Goals Made", "Field Goals Attempted 11 , 

Field Goal Percentage", "Free Throws Made", 

Free Throws Attempted", "Free Throw Percentage", 

Three Pointers Made", "Three Pointers Attempted", 

Three Point Percentage", N Offensive Rebounds", 

Defensive Rebounds", "Total Rebounds", "Assists", "Steal 
Blocks 11 , "Turnovers 11 , "Fouls") 


+ 1, 


#Heat map 

i mage( x=l: nr ow( d at a_ ma t ri x), y =1: ncol(dat a_mat ri x ), 
z =dat a mat r i x, xl ab , y 丨 ab="", col =paI , axes =FALSE) 


pal =br ewer. pal (6, 11 Bl ues") 



#Wh I t e separating lines 
abline(h=c(l: ncol (data mat r i x) ) +0. 5, 


p a r ( ma r = c ( 3, 14 ( 19, 2) ( o ma =c (0. 2, 0. 2, 0. 2, 0. 2) ( me x =0. 5) 


data matrix<-t(scale(data.matrix(nba[ ( -l]))) 


We are going to summarize a number of NBA player statistics in the same heat map using the 
i ma g e () function: 

r own ames (nba)<■ nbaf, 1] 


This example dataset showing some statistics on the top scorers in NBA basketball games 
has been taken from a blog post on FlowingData (see ht t p: / / f I owi ngdat a. com/ 
2010/01/ 21/ how- to - make - a - heatmap-a - qui ck- and- easy- sol uti on/ for 
details). The original data is from the databaseBasketball.com website (http:// 
databasebasketbal I . com/). We will use our own code to create a similar heat map 
showing player statistics. 

We will use the RCo I or Br ewer library for a nice color palette, so let's load it: 
library! RCol or Br ewer) 


How to do it... 
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#Gr a ph Ti 
t ext(par ( 
"NBA per 
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v =c ( 1: nr ow( dat a_ mat r i x)) +0.5, 
col ="whi t e",Iwd=l, xpd=F) 
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Creating Heat Maps and Contour Plots 



Once again, in a way similar to the preceding couple of recipes, we first formatted the dataset 
with the appropriate row names (in this case names of players) and cast it as a matrix. We did 
one additional thing—we scaled the values in the matrix using the scale)) function, which 
centers and scales each column so that we can denote the relative values of each column on 
the same color scale. 

We chose a blue color palette from the RCo I or Br ewer library. We also created a vector with 
the descriptive names of the player statistics to use as labels for the X axis. 

The code for the heat map itself and the axis labels is very similar to the previous recipe. We 
used the i ma g e ( ) function with dat a_ mat r i x as z and suppressed the default axes. Then 
we used t ext ( ) and ax i s () for adding the X and Y axis labels. We also used the text)) 
function to add the graph title (instead of the title)) function) in order to left-align it with 
the Y axis labels instead of the heat map. 


There's more... 


As shown in the FlowingData blog post, we can order the data in the matrix as per the values 
in any one column. By default, the data is in ascending order of total points scored by each 
player (as can be seen from the light to dark blue progression in the Total Points column). To 
order the players based on their scores from highest to lowest, we need to run the following 
code after reading the CSV file: 

nba <- nba[order(nba$PTS) ( ] 



See the help on the o r d e r ( ) function by running ? o r d e r or he I p(order) 
at the R prompt. 



Then we can run the rest of the code to make the following graph: 
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The arguments x and y specify the locations of the grid at which the height values (z) are 
specified. The vo I c a no dataset contains topographic information on a 10X10 m grid, so we set 
the x and y grid arguments to 10 times the index numbers of rows and columns respectively. 

The contour data z is provided by the v o I c a no dataset in a matrix form. 

The graph shows the height of the region in the form of contour lines, which outline all areas 
with the same height. The height for each contour line is shown in gray. 


There's more... 


Now let's improve the graph by making the Y axis labels horizontal and adding some colors to 
the plot area and contour lines: 

par (I as =1) 

pi ot (0, 0, x I i m=c (0, 10*nrow( vol c ano)) , y I i m=c (0, 10*ncol (vo I can 。））， 
type= ll n ll ( xl ab = n Metres West", 

yl ab=" Met r es North" ,ma in =" Topography of Maunga Wh a u Volcano") 
u<- par (" us r") 

rect (u[ 1], u[ 3] , u[2], u[4],col ="Ii ght green") 

contour (x=10*1: nr ow(volcano),y =10*1: ncol(volcano), 
vol cano, col ="r ed", add=TRUE) 


Topography of Maunga Whau Volcano 




















Creating Heat Maps and Contour Plots 


See also 


In the next recipe, we will learn how to make filled contour plots, which use solid color to make 
the graph even easier to read. 


Creating filled contour plots 


In this recipe, we will learn how to make a contour plot with the areas between the contours 
filled in solid color. 


Getting ready 


We are only using the base graphics functions for this recipe. So, just open up the R prompt 
and type the code we are about to see. We will use the inbuilt vol c a no dataset, so we need 
not load anything. 



Let's make a filled contour plot showing the terrain data of the Maunga Whau volcano in R's 
inbuilt vol c a no dataset: 

filled.cont our(x = 10*1: nr ow(vol cano),y = 10*1: ncol(vol cano), 
z = volcano, col or. pal et t e = terrai n. col ors, 
pi ot.ti11e = t i 11 e ( ma i n = "The Topography of Maunga Whau", 
xlab = "Meters North 11 ,ylab = "Meters West"), 

pi ot.axes = {axi s(1 ( seq( 1 0 0, 8 0 0, by = 100)) 

axi s(2, s eq( 1 0 0, 6 0 0, by = 100)) }, 

key. t i 11 e = t i 11 e( ma i n =" Hei g ht \ n( met er s)"), 

key. axes = axi s(4, seq(90, 190, by = 10))) 
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The Topography of Maunga Whau 
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If you type ? f i I I ed. contour you will see that the preceding example is taken from that 
help file (see the second example at the end of the help file). The fill ed. cont our () 
function creates a contour plot with the areas between the contour lines filled with solid 
colors. In this case, we chose the terrain, col o r s ( ) function to use a color palette 
suitable for showing geographical elevations. We set the col or. pal et t e argument to 
t er r ai n. col or s and the f i I I ed. cont our ( ) function automatically calculates the 
number of color levels. 

The basic arguments are the same as those for c o n t o u r ( ) , namely, x and y that specify 
the locations of the grid at which the height values (z) are specified. The contour data z is 
provided by the vol c a no dataset in a matrix form. 

The f i I I ed. cont our ( ) function is slightly different from other basic graph functions 
because it automatically creates a layout with the contour plot and key. We can't suppress 
or customize the styling of the key to a great extent. Also, some of the standard graph 
parameters have to be passed to other functions. For example, the axis labels x I a b and yl ab 
have to be passed as arguments to the title)) function which is passed as the value for the 
pi ot. t i 11 e argument. We cannot directly pass xl ab and ylab to filled.contour!). 


^5\- 

















10 * 1 : nr ow(vol cano), 
ano), z = volcano, 
rrain.colors, 

(main = "The Topography of Maunga Whau 
t h 11 , y I a b = "Meters West 11 ) , nl evel s =100, 

1 , seq( 1 0 0 , 8 0 0 , by = 100 )) 

2 , seq( 1 0 0, 6 0 0, by = 100)) }, 
mai n=" Hei ght\ n( meters)"), 

seq(90, 190, by = 10))) 


Creating Heat Maps and Contour Plots - 

We also have to add our custom axes by setting the pi ot, axes argument to a list of function 
calls to the ax i s () function. Unlike other functions, we cannot simply set axes to FALSE and 
call a x i s () after drawing the graph because of the internal use of I a y o u t ( ) infilled. 
c o n t o u r ( ) . If we add axes after calling filled, cont our ( ) , the X axis will extend beyond 
the contour plot up to the key. 

Finally, we set the title and tick labels of the key using the key. t i 11 e and key. axes 
arguments respectively. Once again, we had to set these arguments to function calls to 
title)) and ax i s () respectively instead of directly specifying the values. 


There's more... 


We can adjust the level of detail and smoothness between the contours by increasing their 
number using the n I evel s argument: 


The Topography of Maunga Whau 
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Note that there are a lot more contours now and the plot looks a lot smoother. The default 
value of nl evel s is 2 0 ， so we increased it by 5 times. The key doesn't look very nice because 
of too many black lines between each tick mark; however, as pointed out earlier, we cannot 
control that without changing the definition of the fill e d. contour)) function itself. 


See also 


In the next recipe, we will learn how to make a three-dimensional version of a filled contour plot. 


Creating three-dimensional surface plots 


In this recipe, we will use a special library to make a three-dimensional (3D) surface plot for 
the vol c a no dataset. The resulting plot will also be interactive so that we can rotate the 
visualization using a mouse to look at it from different angles. 


Getting ready 


For this recipe，we will use the r g I package, so we must first install and load it: 

i nst al I . p a ck age s (" r g I ") 

I i br ar y( r gl ) 

We will only use the inbuilt vol c a no dataset, so we need not load any other dataset. 



Let's make a simple three-dimensional surface plot showing the terrain of the Maunga 
Whau volcano: 


z <- 2 * volcano 
x <- 10 * (1: nr ow( z)) 
y <- 10 * (1: ncol(z)) 


zI i m <- range)z) 
zI en <- zI i m[ 2] - zI i m[ 1 ] + 1 


colorlut <- terrain.colors(zlen) 
col <- col or I ut [ z- z I i m[ 1] +1 ] 


r gI . open() 

rgl . surface( x, y, i , col or =c ol, bac k=" Ii nes " ) 









Creating Heat Maps and Contour Plots 
The 3D surface will look like following: 




RGL is a 3D real-time rendering device driver system for R. We used the rgl. surface)) 
function to create the preceding visualization. Please see the help section (by running ? r g I . 
surface at the R prompt) to see the original example at the bottom of the help file, on which 
the example is based. 

We basically used the vol c a no dataset that we used in the previous couple of recipes and 
created a three-dimensional representation of the volcano's topography instead of the two- 
dimensional contour representation. 

We set up the x, y, and z arguments in a similar way to the contour examples, except that 
we multiplied the volcano height data in z by 2 to exaggerate the terrain which helped us 
appreciate the library's 3D capabilities better. 

Then we defined a matrix of colors for each point in z such that each height value has 
a unique color from the terrain, col o r s ( ) function. We saved the mapped color data 
in c o I (if you type col at the R prompt and hit Return (or Enter), you will see that it contains 
5,307 colors). 

Then we opened a new RGL device with the r gl . open( ) command. This brings up a blank 
window with a gray background. Finally, we called the r gl .surface)) function with the 
x, y, z, and col or arguments. We also set the back argument to " I i n e s 11 , which resulted 
in a wire-framed polygon underneath the visualization. 

Once rgl. surface)) is run, we can rotate the visualization using our mouse in any 
direction. This lets us look at the volcano from any angle. If we look underneath, we can also 
see the wire-frame. The images show snapshots of the volcano from four different angles. 











There's more... 
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The example is a very basic demonstration of the r g I package's functionality. 

There are a number of other functions and settings we can use to create a lot more complex 
visualizations customized to our needs. For example, the back argument can be set to other 
values to create a filled, point, or hidden polygon. We can also set the transparency 
(or opacity) of the visualization using the alpha argument. Arguments controlling the 
appearance of the visualization are sent to the r g I . ma t e r i a I ( ) function which sets 
the material properties. 

Please read the related help sections (? r g I , ? r gl . s ur f ace, ? r gl . mat er i a I ) to get a 
more in-depth understanding of this library. 


Visualizing time series as calendar 
heat maps 


In this recipe, we will learn how to make intuitive heat maps in a calendar format to 
summarize time series data. 


Getting ready 


In this recipe, we will use a custom function called cal endar Heat ( ) written by Paul Bleicher 
(released as open source under the GPL license). So let's first load the source code of the 
function (available from the downloads area of the book's website): 

sour ce("cal endar Heat. R") 

We are going to use the google, cs v example dataset, which contains stock price data for 
Google (ticker GOOG). Let's load it: 

stock.data 〈- read.csv("google.csv") 

cal e nd a r He a t ( ) also make use of the c hr on library, which has to be installed and loaded 
using the following: 

i n s t a I I■ packages!"chron") 

I i b r a r y ( " c h r o n") 
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do it... 


Let's visualize the adjusted closing price of the Google stock in a calendar heat map: 


cal endar Heat(dat es =st ock.dat a$Dat e, 
vaI ues =st ock.data$Adj . Cl os e, 
var name = " Googl e Adjusted Close") 


Calendar Heat Map of Google Adjusted Close 
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We used the cal endar Heat () function, which uses the grid,lattice, and c h r o n 
libraries, to make the heat map. The main arguments are dates and va I ues, which we set 
to the Date and Ad j .Close columns of our dataset respectively. We also used the v a r n a me 
argument to set the title of the heat map. 

There are several other arguments which can be passed to c a I endar Heat (). For example, 
we can specify the format our input dates are in using the date, form argument. The default 
format is YWY-MM-DD, which matches our original dataset. However, if the dates were in 
another format, say MM-DD-YY, we could set d a t e. f o r m to 11 %m- %d - %y". 

The number of colors in the color scale are controlled by the nc o I or s argument, which has a 
default value of 99. The color scheme is specified by the color argument, which takes some 
predefined palette names as values. The default is r 2g (red to green), and other options 
are r 2 b (red to blue) and w2 b (white to blue). We can add more options simply by adding a 
definition for a new color palette as a vector of colors. 


There's more... 


Another useful package which provides a calendar heat map functionality is the openai r 
package, which has been primarily created for air pollution data analysis. Let's make a 
pollution heat map using this package. 

First, we need to install and load it: 

i n s t a I I _ packages!" openai r 11 ) 

Ii br ar y(openai r) 







Creating Heat Maps and Contour Plots - 

To make our first air pollution calendar heat map, we can simply run: 
calendarPlot(mydata) 
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The graph shows some Nitrogen Oxides (NOx) concentration data from London in 2003 in the 
form of a heat map overlaid on a regular calendar. 

We only had to pass one argument my d a t a to the calendar, pi ot() function, which uses 
the package's default my d a t a dataset. Run h e a d ( my d a t a) at the R prompt to see what the 
data looks like and all the columns in the dataset. The first column contains GMT date and time 
values in a long format (YYYY-MM-DD HH:MM:SS)_ If we want to use the calendar, pi ot() 
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function, as it is for visualizing other types of temporal data, we can do so as long as the d a t e 
column is in the same format and we specify the variable to be plotted using the poll ut ant 
argument. The default value of p o I I ut a nt is" n o x", which is the name of the column that 
contains the NOx values. 


Let's say, we want to plot daily sales data instead. Let's use the r n o r m( ) function to create 
some fake data and add it as a column to the my d a t a dataset: 

my d a t a $ s a I es <- r n o r m( I engt h( my d at a$nox), me an =1 0 0 0, sd=1 5 0 0 ) 

The code added a s a I e s column to my d a t a, with random values following a normal 
distribution with a mean of 1 0 0 0 and standard deviation of 1 5 0 0. Now, let's use cal endar 
pi ot () to make a heat map for this sales data. 

calendarPlot( my data,pollutant =" sales ",ma in = " Da ily Sales in 2003 ") 
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Creating Heat Maps and Contour Plots - 

In the example, we set the poll ut a nt argument to the newly created sales column 
(note that we have to pass it as a string in quotes). We also set the plot title using the ma i n 
argument. The cal endar.pl ot () function uses the I at t i c e library to generate the heat 
maps. Please see the help file (? c a I endar.pl ot) to see other arguments you can use. 






Creating Maps 


In this chapter, we will cover: 

► Plotting global data by countries on a world map 

► Creating graphs with regional maps 

► Plotting data on Google maps 

► Creating and reading KML data 

► Working with ESRI shapefiles 


Introduction 


In this chapter, we will take a more in-depth look at visualizing data on geographical maps, 
building on top of our brief introduction in Chapter 1. 

Overlaying datasets from different parts of the world on maps is a very good way of 
summarizing data in its correct geographical context. A lot of data is being made freely 
available. For example, the World Bank and World Health Organization (WHO) publish lots 
of socio-economic and health-related data, which can be plotted on maps. Google Maps 
provides a good API, which can be directly connected to from R as we will see in this chapter. 

We will also learn how to work with Geographical Information Systems (GIS) data 
formats in R. 


As with the previous chapters, it is best to try out each recipe first with the example shown 
here and then with your own datasets so that you can fully understand each line of code. 




Creating Maps 


Plotting global data 
world map 


countries on a 


In this recipe we will learn how to plot country-wise data on a world map. 


Getting ready 


We will use a few different additional packages for this recipe. We need the ma ps package for 
the actual drawing of the maps, the WDI package to get world bank data by countries, and the 
RCo I or Br ewer package for color schemes. So let's make sure these packages are installed 
and loaded: 

i n s t a I I . p a c k a g e s (" ma p s 11 ) 

I i b r a r y ( ma p s) 
i nst al I . packages! 11 WDI 11 ) 

I i brary( WDI ) 

i nst al I . packages!" RCol or Br ewer 11 ) 

I i br ar y( RCol or Brewer) 



There are a lot of different data we can pull in using the world bank API provided by the WDI 
package. In this example, let's plot some C02 emissions data: 

colors = brewer, pal (7, 11 PuRd") 
wgdp<- WDI searchf "gdp") 

w<- WDI ( count ry=" all " , indicator =wgd p[ 4,1], st art =2 0 0 5, end=2 0 0 5 ) 
w[63,l] <- 11 USA" 
x<- map( pi ot=FALSE) 

x $ me as u r e <- a r r ay ( NA, d i m=l engt h( x$names)) 

f o r ( i in l:length(w$ country)) { 
f o r ( j in l:length(x$names)) { 

i f (grepl ( w$count ry[ i ], x $ n a me s[j ], i gnore. case=T)) 
x$meas ur e[ j ] <- w[ i , 3] 
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ill =TRUE, I ty=" bl ank") 


whil e r unm ng the above code, 
he margins so that the graph and 


#Map 

ma p(coI =sc, f 
# I f you get 
enl ar ge t he 
scale fit wi 


s d <- da t a. f r a me (c ol =c ol or s, 

values <- s e q ( mi n (x $ me a s u r e [ ! i s. n a (x $ me a s u r e)]), 
ma x (x $ me a s u r e [ ! i s. n a (x $ me a s u r e) ] ) * 1. 0 0 0 1, 

I engt h. out =7)) 

sc<- array( 11 #FFFFFF", di m=l ength(x$names)) 

for (i in l:length(x$measure)) 
i f ( ! i s. n a (x $ me a s u r e [ i ])) 

sc[i]=as.c ha r a ct er(s d$ c o I [ f i ndl nterval (x $ me a s u r e [ i ], 
sd$val ues)]) 

#2- c ol umn layout with color scale to the right of the map 
I a yo ut ( mat r i x(dat a =c(2 ( 1), nrow=l, nc ol =2), wi dt hs=c(8, 1), 
hei ght s=c( 8, 1)) 

# Color Scale first 
br eaks <-sd$val ues 

p a r ( ma r = c ( 2 0, 1, 2 0, 7 ) , o ma =c (0. 2 ( 0. 2 ( 0. 2, 0. 2) , me x =0. 5) 

i mage(x=l, y =0 : I engt h( breaks), z =t ( mat r i x( br ea ks)) * 1. 001, 
col =coIors[1: I engt h( breaks) - 1] , axes =FALSE 
breaks=breaks, xl ab= IIM , yl ab=" 11 , xaxt= ll n 11 ) 

axi s (s i de=4, at =0: ( I engt h( breaks) - 1), 

I abels=round(breaks), col =" whi t e", I as =1) 

abl i ne (h =c (1 : I ength(breaks)),col =" whi t e", I wd=2, xpd=F) 


ma p( add=TRUE, col =" gr ay 11 , f i II =F ALSE) 

title) "C02 emi ssi ons ( kg per 2 0 0 0 US$ of GDP) 11 ) 
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The map plot of C02 emissions looks like the following: 




We used the ma ps package in combination with world bank data from the WDI package above 
to plot C02 emissions data per 2.000 US$ of GDP for various countries across the world. 

First we chose an RCol or Br ewer color scheme and saved it as a vector called col o r s. We 
then pulled a list of GDP-related variables using the WDI search! ) function. If you type wgd p 
at the R prompt and hit Enter, you will see a list of codes and descriptions of each of these 
variables. For the previous example, we chose the fourth variable (wg d p [ 4, 1 ]), which gives 
C02 emissions (kg per 2.000 US$ of GDP), and passed it to the WDI () function to get data 
for all countries for the year 2005 by setting the country argument to " a I I 11 and start and 
end to 2 0 0 5. 

Next, we created a map object x simply by calling the ma p( ) function and setting pi ot to 
F A L S E ， so that the map is not drawn yet. We did this so that we can map the data we pulled 
from WDI to the country polygons contained in the ma p object. 

First we added a new array called me a s u r e to x ， with NA as default values and length 
matching the number of country names in x. If you type x$ names at the R prompt and hit 
Enter, you will see the whole list of country names. Similarly, w$count r y contains the names 
of the countries for which the WDI package has data. Note that the ma p object has a lot more 
names because it contains regional information at a finer detail than just countries. So, we 
must first match the names of countries in the two datasets. 
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For the example, we use a simple search function g r e pi ( ) , which looks for the WDI country 
names in the map object x and assigns the corresponding C02 emissions values from w to 
x $ me a s u r e. This is a very approximate solution and misses on countries where the names in 
the two datasets are not the same. For example, the United States is named USA in the WDI 
dataset. To match all the countries exactly, we need to manually check the important ones we 
are interested in. In the example, the United States was corrected manually. 

Next we created a data frame called s d to define a color scheme with intervals based on a 
sequence from the minimum to the maximum values in x $ mea s u r e. We use s d to assign a 
color for each of the values in x $ me a s u r e by creating a vector called s c. First we create s c 
with default values of white, so that any missing values are depicted without any color. Then 
we used the f i ndl nt er val () function to assign a color to each value of x $ me a s u r e. 

Finally, we have all the ingredients for making the map. We first used the I a y o ut ( ) function 
to create a 1X2 layout just like we did for heat maps in the previous chapter. 

We need to plot the color scale first here because if we plot the map first, the scale cannot 
be plotted on the same layout and results in a new plot with just the scale. We reversed this 
plotting order by setting the data argument in I ayout ( ) to c ( 2, 1) instead of c ( 1, 2). 

The color scale is drawn in exactly the same way as in the previous chapter for heat maps, 
using the i ma g e () function. To draw the map itself, we used the ma p( ) function. We set the 
col argument to the vector s c which contains colors corresponding to each polygon on the 
map. We set fill to T R U E and I t y to " b I a n k 11 , so that we get the polygons filled with the 
specified colors and no blank borders around them. Instead, we add gray borders by calling 
the map( ) function with add set to T R U E, c o I set to g r a y and fill set to F A L S E. Finally, 
we added a plot title using the t i 11 e( ) function. 


There's more... 


The example shows just one variable for one year visualized on a map. The world bank 
package gives 73 different metrics related to GDP alone (as can be seen in the wgd p 
variable). See the help section for the WD I package for more details about other data 
available (? WDI and ? WDI s e a r c h). If you have any other data by country from another 
source, you can use that with the map( ) function in the example as long as the country 
names can be matched to the names of regions in the ma p object. 


See also 


In the next recipe, we will learn how to plot regional data on individual country maps instead 
of on a world map. 
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Creating graphs with regional maps 


In this recipe we will learn how to plot data on regional maps within individual countries 
rather than the whole world map. We will look at examples based on the United States 
and European countries. 


Getting ready 


Just like the previous recipe, we will make use of the ma ps package for drawing the map and 
the RCo I or Br ewer package for choosing color schemes. So, let's make sure they are loaded: 

I i b r a r y ( ma p s) 

I i br ar y( RCol or Brewer) 

We will use the inbuilt US Ar r e s t s example dataset, which contains crime statistics, in 
arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states 
in 1973. 



Let's plot the arrests rate for murders in US states in 1973. The default graphics device size 
may not be big enough for the map, so if you get an error about figure margins, please enlarge 
the graphics device: 

x<- map(" st at e", pi ot=FALSE) 


f o r ( i in 1: I ength(rownames(USArrests))) { 
f o r (j in l:length(x$names)) { 

i f (gr epl (r owna mes ( USAr r est s) [ i ] , x$names [ j ], i gnore. case=T)) 
x$meas ur e[ j ] <- as. doubl e( USAr r est s $Mur der [ i ]) 


colors <- brewer, pal (7, "Reds 11 ) 

s d <- da t a. f r a me (c ol =c ol or s, 

val ues=seq( mi n (x $ me a s u r e [ ! i s. n a (x $ me a s u r e)]), 

max(x$measure[!is.na(x$measure)])*l_0001, 

I engt h. out =7)) 

breaks <-sd$val ues 


matchcol <-functi on(y) { 

as.character(sd$col[findl nterval (y, sd$val ues)]) 
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layout) matrix(data=c (2,1), nr ow=l, nc ol =2), 
wi dt hs=c( 8, 1), hei ghts=c( 8, 1)) 

# Color Scale first 

p a r ( ma r = c(20,l,20,7),oma=c(0.2,0.2,0.2,0.2),mex=0.5) 

i mage( x = l ( y=0: I engthf breaks), z=t(matri x( breaks))*l. 001, 

col =coI ors[ 1: I engthfbreaks)- 1],axes =FALSE, breaks=breaks, 

xlab="", ylab = "", xaxt="n") 

axi s(4,at =0:(I engthfbreaks) - 1), 

labels=round(breaks),col ="whit e",I as =1) 

abl i ne( h=c( 1: I engthfbreaks)), col ="whi t e",Iwd=2, xpd=F) 


#Ma p 

map( "state", boundary = FALSE, col =matchcol (x$measure), 
fill =TRUE, I t y= n bl ank 11 ) 

map("state", col ="whit e",add = TRUE) 

t i 11 e("Mur der Rates by US State in 1 9 7 3 \n 
(arrests per 100,000 residents)", line=2) 
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The example is similar to the previous recipe in its overall structure, but it differs mainly in the 
fact that we plotted data for one country's states. We used the USAr rests dataset, which is 
inbuilt in R and contains various crime figures by state for the United States. 

Just like the previous recipe we first mapped the values of the chosen statistic (murder rates 
in this case) to the corresponding region names (in this case states) in the ma p object created 
using the ma p ( ) function. We chose a red color scheme from RCol or Br ewer. 

Instead of creating a vector of colors for each of the values plotted, we defined a function 
ma t c h c o I ( ) which takes a value as an argument and uses the f i nd I nt er val ( ) function 
to return a color value from the data frame s d which contains the breaks and corresponding 
colors from the chosen palette. 

We then created a two column layout and drew the color scale first in the right column. Then 
we plotted the map with fill set to TRUE and col set to a function call to ma t c h c o I () 
with x $ me a s u r e as the argument. We set the boundary to F A L S E, to draw white boundaries 
instead of the default black ones. We did so by calling ma p ( ) again with c o I set to whi t e 
and add set to TRUE. Finally, we used the title!) function to add a map title. 


There's more... 


Mapping data by states is just one of the options in the ma p s package for the United States. 
We can also map data by counties and regions defined as groups of specific states. For 
example, we can draw a county map of New York with: 

ma p ( " c o u n t y" , "new y o r k 11 ) 

Or we can draw a map with three states with: 

ma p (" s t a t e 11 , region = 〔 ("California 11 , "Oregon", 11 n e v a d a 11 )) 

Now let's look at another example, this time from a European country: 

ma p( 1 it a Iy 1 , fill = TRUE, col = brewer. pal(7,"Set 1")) 
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The preceding example uses the inbuilt dataset for Italy in the ma p s package. We used the 
colors just to differentiate the various territorial units from each other; the colors do not 
represent any numerical quantity. The maps package does not have geographical data for 
other countries. But there is one good source for world-wide geographical data: the GADM 
database of Global Administrative Areas. One can freely download data for countries across 
the world in R's native RData format for non-commercial use from the website 
http://gadm.org. 

The GADM data can be used in combination with the s p package to plot regional data on 
maps. Let's look at an example of rainfall in France. First let's make sure the s p package is 
installed and loaded: 

i nst al I . p a c k a g e s ( 11 s p 11 ) 

Ii br ar y(sp) 
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Now let's create some pseudo rainfall data for the French administrative regions and plot it on 
a map of France: 

I oad( ur I ( 11 h11 p: / / gadm. or g/ dat a/ r da/ FRA_adml. RData")) 
gadm$r ai nf a I I <- r no r m( I engt h( gadm$NAME_l), me an =5 0, sd=15) 
s ppl o t (g a d m, 11 r a i n f a II ", 

col . r egi ons = rev(terrain.colors(gadm$ rainfall)), 

main: 11 Rainfall (simulated) in French administrative regions") 


Rainfall (simulated) in French administrative regions 



First we loaded the geographical boundary data for France by calling the I oad( ) function 
with a u r I of the location of the dataset on the GADM website. In this case, the dataset 
loaded was F RA_ ad ml. RData. This function call stores the data in an object called g a d m 
(you can verify this by typing g a d m at the R prompt and hitting Enter). Next, we appended a 
vector of pseudo rainfall data to g a d m by calling the r n o r m( ) function. 
















Chapter 9 


Finally, we used the s ppl ot ( ) function from the s p package to plot the data. The first 
argument to s p p I o t ( ) is the object gadm itself and the second argument is the name of the 
variable we wish to plot on the map. We set the fill color of the regions using col. r eg i on; 
this is slightly different from the map( ) function because the s p package is based on the 
lattice library. We used a color scheme based on the terrain, col o r s ( ) function, but 
reversed it with r e v ( ) so that low to high rainfall is represented by gray through brown 
to green. 


Plotting data on Google maps 


In this recipe, we will learn how to plot data on top of Google map images using a special 
package that connects to Google's Static Maps API. 


Getting ready 


First we need to install the Rgoogl eMaps package and a related package r gdal : 

i nst al I . pac kages ( 11 r gda I ") 
library(rgdal) 

i nst al I . packages (■_ Rgoogl eMaps") 

Ii br ar y(Rgoogl eMaps) 

We will use the I ondonai r. c s v example dataset for this recipe. This dataset contains 
annual average concentrations of particulate matter in London's atmosphere measured at 12 
different air quality monitoring sites across the city (data source: London air website 
http:// www. I ondonai r. o r g. u k). So let's load that too: 

air <-read.csv( " londonair.csv") 
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do it... 


Let's pull a Google map of London city and plot the pollution data as points on top of it: 

I ondon 〈- Get Map(cent er =c(51. 51,- 0. 116), 

zoom =10, destfile = "London, png", maptype = "mobile") 

PI ot OnSt ati cMap(I ondon,I at = ai r $1 at, I on = ai r $1 on, 
cex=2, pc h = 19 ( col =as. char act er(ai r$col or)) 





Now let's make the same graph with a satellite image map instead of the roadmap: 

I ondon<- GetMap(center=c( 51. 51, - 0. 116) , zoom =13, 
destfile = "London_satel I i te. png", maptype = "satellite") 



















PI ot OnSt at i cMap( I ondon, I at = ai r $1 at, I on 
cex=2, pc h = 19,col =as. characterfai r$col or)) 


a i r $1 on 
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In the examples, we first used the Get Ma p ( ) function from the Rgoogl eMaps package to pull 
a map of London from the Google Static Maps API (see http://code.google.com/apis/ 
maps/documentati on/stati cmaps/ for more details about the API). We then used the 
PI ot OnSt at i c Ma p() function to overlay our air pollution data points on the map. 

The first and most important argument to the Get Map( ) function is the center argument, 
which takes a vector of two values specifying the latitude and longitude of the location to be 
used as the center of the map. The zoom level is specified by the zoom argument, which has a 
default value of 12. The higher the value of zoom, the more detailed and zoomed in the view. 
In the example, we set zoom to 10 so as to capture a wide area of London. 


We also specified the d es t f i I e argument to save the retrieved map as London, png. The 
default value ofdestfi I e is My Ti I e. png. You can check whether the map is retrieved by 
looking for the PNG file in your working folder. 
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Finally, we also set the mapt ype argument, which can take one of a number of different 
values such as 11 r o a d ma p", 11 mo b i I e 11 , 11 s a t e I I i t e 11 , 11 1 e r r a i n 11 , 11 h y b r i d", 

11 ma p ma k e r - r o a d ma p", and 11 ma p ma k e r - h y b r i d". The default map type is t e r r a i n. 

We set ma pt y pe to mo b i I e in the first example and s at e I I i t e in the second example. 

If you look at the output of the Get Ma p () function call at the R prompt you will notice that it 
shows a URL such as: 

[1] http:/ /maps, googl e.com/stati cmap?cent er =51. 51,-0. 116&zoom=10&si z e = 
64 0x 64 0 &mapt ype=mobi I e&format =png32&key =&sensor =t r ue 

Basically, the Get Ma p( ) function creates an HTTP GET request URL with parameters based 
on the arguments supplied. To test this, copy the provided URL and paste it into the address 
bar of a web browser. You should get the image of the specified map. 

We saved the object returned by the Get Ma p ( ) function call as I ondon, which we then 
passed as the first argument to the PI ot OnSt at i c Ma p ( ) function. As the name suggests, 
this function plots data on top of ma p objects. The air pollution dataset I ondonai r. c s v that 
we loaded earlier contains monitoring site data including site code, name, latitude, longitude, 
particle concentration (PM 10)，and a color based on the concentration value. We passed 
these values to the PI ot OnSt at i c Ma p ( ) function. We set the I at and I on arguments to 
the I at and I o n columns in the air data frame respectively. We set the c o I argument to the 
color column in air. 


There's more... 


We can overlay more data points or lines successively on top of a map by setting an additional 
argument add to TRUE. By default, add is set to F A L S E which creates a new map with 
the specified data points or lines. To draw lines instead of points, we need to set the F U N 
(meaning function) argument to I i n e s. By default, FUN is set to p o i nt s. 

The following is another example pulling in a hybrid map of New York: 

GetMap(center=c( 40. 7 1 4 7 2 8,- 7 3. 9 9 8 6 7 ), zoom =14, 
destfile = "Manhat t an. png", maptype = "hybrid"); 

Another maps library, which is becoming increasingly popular, is Open Street Map 
(http:// www.openstreetmap.org/). It's a free and open source editable library, 
unlike Google's proprietary maps API. The following is an example based on the 
Get Ma p. OS M( ) function which uses the Open Street Map server: 

Get Map.0SM(I onR= c(- 7 4. 6 7 1 0 2, - 7 4.6 3 9 4 3 ), 

I at R = c( 4 0. 3 3 8 0 4, 40. 3 5 5 6 ).scale = 7 5 0 0, 
destfile = "Pri ncet onOSM. png") 


Get Ma p. OS M( ) takes the ranges of longitude and latitude as two two-valued vectors I onR 
and I at R respectively. The scale argument is analogous to the zoom argument for the 
Google API. The larger this value, the more detailed the resulting map. 




See also 
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In the next recipe we will learn how to interact with Google's KML language for expressing 
geographic data. 


Creating and reading KML data 


In this recipe, we will learn how to read and write geographic data in Google's Keyhole Markup 
Language (KML) format, which can be used to visualize geographic data with Google Earth 
and Google Maps. 


Getting ready 


We will use the r gdal package in this recipe. So let's make sure it's installed and load it: 

i nst al I . pac kages (" r gda I 11 ) 
library(rgdal) 


it... 


We will use data from the cities shapefile that's installed as part of the r gdal package. 
First we will write a KML file and then read it: 

cities <- r ead0GR( sy st em. f i I e( " vect or s", 
package = 11 r g d a I ") [ 1 ], " c i t i es 11 ) 

wr i t eOGR( ci ties, 11 c i t i es. k ml 11 , "cities", dr i ver =" KML") 
df <- readOGR( 11 c i t i es. kml 11 , "ci t i es") 



In the preceding example, we first used the read 0GR( ) function to read the cities 
shapefile dataset. The first argument is the folder (directory) where the data shapefile 
is and the second argument is the name of the shapefile (without the . s hp extension). 

We stored the object returned by the read 0GR( ) function as c i t i e s, which is of class 
Spatial Poi nt s Da t a F r a me. 

To create a KML file, we used the wr i t eOGR() function. We passed the cities object as 
the first argument. The second argument specifies the name of the output KML file, the third 
argument specifies the shapefile layer name (without extension), and the fourth argument is 
the driver (in this case KML). 
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To read the KML file back into R, we used the r eadOGRf) function with only two arguments. 
The first argument specifies the KML data file to be read and the second argument specifies 
the name of the layer. 


See also 


In the next recipe, we will learn how to work with ESRI shapefiles. 


Working with ESRI shapefiles 


In this recipe we will learn how to read and write geographic data in the form of shapefiles 
(_ s h p )，using Geographical Information Systems (GIS) software created by ESRI and some 
other similar software. 


Getting ready 


We are going to use the mapt ool s package for this recipe. So let's install and load it first: 

instal I. packages! "maptools 11 ) 
library! maptools) 



We are going to read an example shapefile provided with the mapt ool s package and plot it: 

sfdata <- readShapeSpati al (system, fi I e( "shapes/si ds.shp", 
package? maptool s") [ 1] , pr oj 4st r i ng=CRS(" +pr oj =1 ongl at 11 )) 

pi ot (sf dat a, col =" orange", border ="white 11 , axes =T RUE) 



—\ 220 [ 



















To write out the data as another shapefile we can do: 
wri t eSpat i al Shape) sf dat a, "xxpol y 11 ) 
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We used thereadShapeSpati al () function of the ma p t o o I s package to read in a 
shapefile. This function takes a shapefile name as an argument and reads the data into a 
Spatial Pol ygons Dat aFr ame object. The first argument in the example is the path to the 
example shapefile s i ds. s hp which is provided as part of the mapt ool s package installation. 
The second argument pr oj 4s t r i ng specifies the projection type as I o n g I at so that the 
spatial co-ordinates are interpreted correctly as longitudes and latitudes. 

We saved the object returned by r e a d S h a p e S p a t i a I ( ) as s f d a t a (of data class 
Spatial Pol y go ns Da t a F r a me), which we then passed to the p I ot ( ) function to 
create a map from the shapefile data. 

Once we've read the data into the appropriate format, we can perform any transformations 
on the data. To save the transformed dataset back into a shapefile, we use the 
wr i t eSpat i al Shapef ) function which takes the data object as the first argument and the 
name of the output shapefile (without any file type extension) as the second argument. 


There's more... 


There is another package called s hapef i I es ， which can be used to read and write 
shapefiles. To use it, we must first install and load it: 

i nst alI. packages!"shapefiI es") 

Ii br ar y(shapefiI es) 

To read a shapefile using this package we can use the r ead. shapef i I e( ) function: 

sf <- syst em. f i I e(" s ha pes/ s i ds. s hp", package = " mapt ool s") [ 1] 
sf <-subst r(sf,1, nc har(sf)- 4) 
sf dat a <- read.shapefiIe(sf) 

We first saved the path of the s i d s. s h p example file in a variable called s f . We had to trim 
the path string to remove the extension . s h p because the r ead. shapef i I e( ) function 
takes just the name of the shapefile as its argument. The shapefile data is saved in a list 
called sf dat a. 

To write out a shapefile using this package we need to use the wr i t e. s hapef i I e() 
function: 

wr i t e. shapef i I e( sf dat a, 11 newsf 11 ) 

The wr i t e. s hapef i I e() takes two key arguments: the first is the data object (s f dat a in 
the example) and the second is the name of the new shapefile without any file extension. 
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Finalizing graphs 
for publications and 
presentations 


In this chapter, we will cover: 

► Exporting graphs to high resolution image formats: PNG, JPEG, BMP, TIFF 

► Exporting graphs to vector formats: SVG, PDF, PS 

► Adding mathematical and scientific notations (typesetting) 

► Adding text descriptions to graphs 

► Using graph templates 

► Choosing font families and styles under Windows, Mac OS X， and Linux 

► Choosing fonts for PostScripts and PDFs 


Introduction 


In the previous chapters, we have learnt how to make graphs of different types and styles 
using various functions and arguments. In this chapter, we will learn some tricks and tips to 
add some polish to our graphs so that they can be used for publication and presentation. 

We will look at the different image file formats we can save our graphs in and learn how 
to export our graphs at high resolutions. Most publications require authors to submit high 
resolution figures along with their manuscripts. We will also look in more detail at vector 
formats such as PDF, SVG, and PS, which are preferred by most publications since these 
are resolution-independent formats. 




Finalizing graphs for publications and presentations 


We will also learn how to add mathematical and scientific notations to graphs. These are 
indispensible in any scientific data visualization. We will also see how to add text descriptions 
inside graphs, which can be very handy as slides for presentation. Graph templates are a way 
to save time by creating functions which cut down repetitive code, so that once we are happy 
with the basic structure of a graph, we can experiment with various pre-defined themes to 
choose the most appropriate color combinations and styles. 

Finally, we will also look at how to choose fonts under different operating systems and graphic 
devices. We will also learn how to add new font mappings and to choose additional font 
families for vector file formats. 

As with the previous chapters, it is best to try out each recipe first with the example shown 
here and then with your own datasets so that you can fully understand each line of code. If 
you are preparing any graph for publication or presentation, it is also good practice to print out 
the saved graphs and verify that the printed output looks correct and clear. 


Exporting graphs in high resolution image 
formats: PNG, JPEG, BMP, TIFF 


In this recipe, we will learn how to save graphs in high resolution image formats for use in 
presentations and publications. 


Getting ready 


We are only using the base graphics functions for this recipe. So, just run the R code at the 
R prompt. You may wish to save the code as an R script so that you can use it again later. 



Let's re-create a simple scatter plot example from Chapter 1 and save it as PNG file 600 px 
high and 600 px wide with a resolution of 200 dots per inch (dpi): 


png("cars, png", r es =2 0 0, hei ght =6 0 0,wi dt h=6 0 0 ) 


pi ot(cars$di st 〜 cars$speed, 

ma in =" Relationship bet ween car distance and speed", 

xl ab=" Speed (miles per hou r) ", y I a b = " Di s t a nc e travelled (miles)", 

xl i m=c(0, 30) , yl i m=c( 0, 140), 

xaxs="i ", yaxs="i col ="red", pch=19) 







The resulting cars, png file looks like the following: 
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Speed (miles per hour) 


The pictured graph has a high resolution but the layout and formatting has been lost. So, let's 
create a high resolution PNG while preserving the formatting: 

png( "cars, png", r es =2 0 0, hei ght =6 0 0,widt h=6 0 0 ) 

pa r ( ma r =c ( 4, 4, 3, 1) , omi =c (0. 1, 0. 1, 0. 1, 0. 1) , mgp =c (3, 0 • 5, 0), 

I as =1, mex=0. 5, cex. mai n =0. 6, cex. I ab=0. 5, cex. axis =0. 5) 

pi ot(cars$di st-car s$speed, 

ma in =" Relationship bet ween car distance and speed", 

xlab=" Speed (miles per hour) ",ylab = " Distance travelled (miles)", 

xl i m=c(0, 30) , yl i m=c( 0, 140), 

xaxs= M i 11 , yaxs= n i M , 

col ="r ed" f pc h = 19,cex =0_ 5) 

dev. of f () 
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The resulting PNG file looks like the following: 


Relationship between car distance and speed 
140-1 - 


To save our graph as a high resolution PNG (200 dpi), we had to set the r e s argument of 
the p ng ( ) function to a value of 2 0 0. The default value of r es is 7 2. We also set both the 
hei g ht and wi dt h arguments to 6 00. 

In the first example, we can see that simply specifying the resolution and dimensions of 
the PNG file is not enough. The resultant image loses its original formatting and layout. In 
addition to specifying the resolution and size, we also need to re-adjust the margins and 
sizes of various graph elements, including the data points, axis, plot titles, and axis labels. 
We set these parameters using the pa r () function and its arguments as we learnt in 
Chapter 1 and Chapter 2. 

To save the graphs as even higher resolution images, we would again need to adjust the 
relative margins and sizes of the graph components. 



more... 


There 1 



To save a graph in other formats such as JPEG, BMP, and TIFF，we can use the r e s argument 
in the j p e g ( ) ， b mp ( ) ， and t i f f ( ) functions respectively. 















See also 
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In the next recipe, we will learn how to save graphs in vector formats. 


Exporting graphs in vector formats: 
SVG, PDF, PS 


In this recipe, we will learn how to save graphs in vector formats such as PDF, SVG, and 
PostScript (PS), which are resolution-independent. 


Getting ready 


Once again we will use the basic graph functions. So, just make sure you have started R and 
type the code at the R prompt. 



Let's use the same scatter plot example from the previous recipe and save it in different 
vector formats, starting with PDF: 


pdf ( "cars, pdf 11 ) 


pi ot (cars$di st-cars$speed, 

ma in =" Relationship bet ween car distance and speed", 

xl ab=" Speed (miles per hour) ■_, yl ab = " Di st ance travelled (miles)", 

xl i m=c ( 0, 30) , yI i m=c ( 0, 140), 

xaxs=" i , yaxs= M i 11 , 

col =" r ed 11 , pc h = 19, c ex =0 . 5 ) 


dev. of f () 

Similarly, we can save the graph as SVG or PS using the s v g ( ) and post scr i pt () 
functions respectively: 

svg( M 3067_10_03. svg") 

#pl ot command here 
dev. of f () 


postscri pt ( 11 3 0 6 7_10_03. ps") 
#pl ot command here 
dev. of f() 


iHzl- 
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The vector format export commands are similar to the image format commands we saw in the 
previous recipe. First we open a new device by calling the pdf() ， svg()，or postscripts 
functions with the output filename as its only argument, then issue the plot command and 
finally close the device with dev. of f (). 



Windows users will have to use the C a i r o S VG( ) command in order to export 
files to SVG format. First import the Cairo package: 
i nstal I . packages! 11 Cai ro") 

I i br ar y( Cai r o) 

And then use the following commands: 

Cai roSVG("3067_10_03_svg") 

#pl ot command here 
dev. of f() 


Since vector formats are resolution-independent, you can zoom in or out of them without 
losing any clarity of the graph. Size does not affect the resolution. So, unlike the image 
formats in the previous recipe, we did not have to re-adjust the graph margins and component 
sizes to save the graph as PDF, SVG, or PS. 


There's more... 


We can save more than one graph in a single PDF file by setting the o nef i I e argument to 
TRUE (the default value). This is a useful output for presentations. All we have to do is issue 
the pdf ( ) command with the output file name, then issue all the plot commands in the 
desired order and close the device with dev. of f ( ) . For example, let's make three variations 
of the cars plot with three different colors for the data points and save them into one file: 

pdf ( M mul ti pi e. pdf 11 ) 

f o r ( i in 1:3) 

pi ot (cars, pc h = 19, col =i ) 

dev. of f() 
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Another important setting when saving graphs in vector formats is the color model. Most 
publications require authors to use the CMYK (Cyan Magenta Yellow Key) color model in 
their graphs, instead of the default RGB (Red Green Blue) model. We can save our graphs 
as PDFs or PostScripts with the CMYK color model simply by setting the col o r mo d e I 
argument to c my k: 

p d f ( " mu I t i p I e. p d f 11 , c o I o r mo d e I =" c my k 11 ) 

f o r ( i in 1:3) 

pi ot (cars, pc h = 19, col =i ) 

dev. of f () 

By default, col o r mo d e I is set to r g b. The other possible value is g r a y for grayscale. 


Adding mathematical and scientific 
notations (typesetting) 


Producing graphs for scientific journals is rarely ever done without adding some special 
scientific and mathematical notations, such as subscripts, superscripts, symbols, and other 
notations. In this recipe we will learn how to add these to annotations to our graphs. 


Getting ready 


We are only using base graphics functions for this recipe. So, just open up the R prompt and 
type the following code. We will use the a i r pol I ut i on. c s v example dataset for this recipe. 
So let's first load it: 

air <-read.csv(" airpollution.csv") 
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Relationship between PM 10 and NO x 



100 200 300 400 500 600 


NO x concentrations (|ig~ 3 ) 


Let's make a scatter plot of concentrations of Particulate Matter(PM) versus Nitrogen 
Oxides(NOx) and add titles with subscripts as in PM 10 and NO x and units mg rrv 3 : 


pi ot ( ai r, I as =1, 


maI n=express 
xl ab=express 
yI ab=express 


on( pastef 11 Rel ati onshi p bet ween ", PM[ 10], " and ", N0[ X])) 
on( pastel N0[ X], " concentrations (", mu*g A -3, , 

on( pastef PM[ 10], " concentrations (", mu*g A - 3,")"))) 


In the example, we added three new elements of special formatting and notation: subscripts, 
superscripts, and a Greek symbol, using the expressi o n () function. 

The expressi o n ( ) function accepts arguments in a pre-defined syntax and translates 
them into the desired format or symbol. For example, any characters enclosed within square 
brackets [ ] are converted to subscripts, such as the X in NO x and 10 in PM 10 . Similarly, any 
characters following the A sign are converted to superscripts, such as the power value -3 in 
mg rrr 3 . The letters mu are converted to symbol |j denoting micro. 


How to do it... 


How it works... 



In the example, we used a combination of regular text and expressions by using the 
expression!) function with the p a s t e ( ) function. 






- Chapter 10 


There's more... 


There are a lot more options and functions we can use inside expression^ to create a 
lot more advanced notations than subscripts and superscripts. For example, integral!), 
f r a c () , s qr t ( ) , and s u m( ) can be used to create mathematical signs for integrals, 
fractions, square roots, and sums respectively. 

To see and learn all the possible options and symbols, run the following command at the 
R prompt: 

d e mo (p I o t ma t h) 

You will see the following symbols displayed on the plot device. You will need to press Return 
or Enter to progress through each set of symbols: 


Arithmetic Operators 

Radicals 

x + y 

x+y 

sqrt(x) 


x-y 

x-y 

sqrt(x.y) 


x*y 

xy 

Relations 

x/y 

x/y 

x==y 

x=y 

x %+-% y 

x 土 y 

x !=y 

x^y 

x%/%y 


x<y 

x<y 

x%*%y 

xxy 

x<=y 

x<y 

x %%y 

xy 

x>y 

x>y 

-X 

-x 

x>=y 

x>y 

+x 

+x 

x %― %y 

x«y 

Sub/Superscripts 

x %=-% y 

x=y 

x[i] 


x %==% y 

Xsy 

x A 2 

^2 

x %prop% y 

xocy 

Juxtaposition 

Typeface 

x*y 

xy 

plain(x) 

X 

paste(x, y, z) 

xyz 

italic(x) 

X 

Lists 

bold(x) 

X 

list(x.y_z) 

x,y,z 

bolditalic(x) 

X 


underline(x) 

X 
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Ellipsis 

Arrows 

list(x[1]x[n]) 

Xl ， • • ■, 

x %<->% y 

x<->y 

x[1] + … + x[n] 

Xl+ 

x %->% y 

x->y 

st(x[1], cdots, x[n]) 

Xi, -,Xn 

x %<-% y 

x<-y 

x[1] + Idots + x[n] 

X| + ... 

x %up% y 

xty 

Set Relations 

x %down% y 

x4-y 

x %subset% y 

xcy 

x %<=>% y 

x<=>y 

x %subseteq% y 

xcy 

x %=>% y x=>y 

x %supset% y 

XDy 

x %<=% y 

x^y 

x %supseteq% y 

xsy 

x %dblup% y 

x 介 y 

x %notsubset% y 

xcy 

x %dbldown% y 

x-U-y 

x %in% y 

x ey 

Symbolic Names 

x %notin% y 

x gy 

Alpha - Omega 

A-Q 

Accents 

alpha - omega 

a - ① 

hat(x) 

A 

X 

phil + sigmal 

cp+<; 

tilde(x) 

X 

Upsilonl 

T 

ring(x) 

X 

infinity 

CO 

bar(xy) 

xy 

32 * degree 

32° 

widehat(xy) 


60* minute 

60 f 

widetilde(>y) 

>5 

30* second 

30 " 


Style 

displaystyle(x) 


X 

textstyle(x) 


X 

scriptstyle(x) 


X 

scriptscriptstyle(x) * 

Spacing 

x~~y 


x y 


x + phantom(O) + y 

x+ +y 

x + over(1, phantom(O)) 

1 

x+- 

Fractions 

frac(x, y) 

X 

y 

over(x, y) 

X 

y 

atop(x.y) 

X 

y 
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Big Operators 

sum(x[i], i = 1. n) 

& 

prod(plain(P)(X == x). x) 

np(x=x) 

X 

integral(f(x) * dx, a, b) 

f f(x)dx 

■^a 

union(A[i], i == 1 _ n) 

UAj 

i=1 i 

intersect(A[i], i == 1, n) 

nAj 

M i 

lim(f(x),x %->% 0) 

limf(x) 

x—0 

min(g(x),x>= 0) 

min g(x) 

x>0 

inf(S) 

infS 

sup(S) 

supS 


Grouping 

(x + y) * z 

(x+y)z 

yf\ + z 

x y +z 

x 八 (y + z) 

x ㈣ 

x 八 {y + z} x y+z 

group(T. _• b). T) 

(a,b] 

bgroup(T_atop(x.y),T) P 

W 


group(lceil,x, rceil) 

「xl 

groupflfloor, x, rfloor) 

LxJ 



group(T. x. T) 


|x| 
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0 200 400 600 800 1000 


Index 

1 W 

The normal distribution has density f(x) = -j— e ^ 

where ^ is the mean of the distribution and a the standard deviation. 


Random Normal Distribution 


Sometimes we may wish to add descriptions to a graph, say if we are producing a 
PDF for presentation or as a handout with notes. In this recipe, we will learn how to 
add text descriptions in the margins of a graph, instead of having to add it separately 
in another program. 


We are only using the base graphics functions for this recipe. So, just open up the R prompt and 
type the code we are about to see. You may wish to save the code as an R script for later use. 


Let's plot a random normal distribution and add a little bit of description below the graph: 
pa r ( ma r =c ( 12, 4, 3, 2)) 

pi ot(rnorm( 1000) , mai n=" Random No r ma I Distribution") 

desc<-expressi on(paste(" The normal distribution has density 

f(x) == f rac( 1, sqrt( 2*pi )*si gma) 〜 p I a i n (e) A f r a c (- (x - mu) A 2, 2 * s i g ma A 2))) 

mtext(desc,si de = l, I i ne=4, padj =1, adj =0) 


mtext(expressi on(paste( 11 where ", mu, " i : 
and 11 , s i g ma, " the standard deviation.")) 
si de=l, I i ne=7, padj =1, adj =0) 


he me an of the distribution 




Getting ready 


Adding text descriptions 
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ow shows sales data for Product A in the month of 
were a lot of ups and downs in the number of units 
mber of units sold was around 5000. The highest 
on t he 2 7th January, nearly 7 0 0 0 units sold." 


desc <- 11 The g 
January 2010 
sold. The av 
sales were r 


mt ext (pa s t e( s t r wr a p( aes c, wi at h=80), coll apse=" \ 
n"), si de=3, I i ne=3, padj =0, adj =0) 

title( 11 Daily Sales Trends",line = 10,adj=0,font =2) 



In the example, we set the bottom margin of the plot to a high value and used the mt e x t () 
function to add a small description below the graph. 

We created an expression called desc with the expressi o n ( ) function we saw in the 
previous recipe and used mt e x t ( ) to place it in the fourth line of the bottom margin. To make 
the text top-left aligned we set p a d j to 1 and adj to 0. We used mt e x t ( ) again to place the 
other half of the description on the seventh line of the margin. We had to split the description 
into two halves and use mt e x t ( ) twice because we couldn't automatically line wrap an 
expression. We will soon see another example with a text-only description, where we can wrap 
it in just one mt e x t () function call. 


There's more... 


Let's look at another example, where we add the description above the graph but just below 
the title. This time the description will just be plain text and will not contain any expressions. 
We will use the d a i I y s a I es. c s v example dataset and make a line graph of daily sales data: 

da i I ysal es<- read. csv( 11 dai I ysal es.csv") 
par ( mar =c( 5, 5, 12, 2)) 

pi ot (u ni t s -as. Dat e( dat e, 11 %d/ %m/ %y") , dat a=dai I ysal es, t y pe=" I ", 

I as=l, yl ab="Uni ts Sol d", xl ab=" Date") 
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This will produce the following graph: 


Jan 04 Jan 09 Jan 14 Jan 19 Jan 24 Jan 29 
Date 


In the example, we set the margins such that the top margin is 12 lines wide. We created a 
string called des c with the description for the graph. We then used mt e x t () to place the 
string in the third line of the margin. We couldn't simply pass des c to mt e x t ( ) because 
it wouldn't fit within the width of the plot area and would get chopped off after the first 
sentence. So we used the s t r wr a p () function to wrap the string with a wi dt h of 8 0 
characters. We used the paste)) function to join the split strings created byst r wr ap(), 
with line breaks added by setting the col I apse argument to " \ n". Finally, we used the 
title)) function to add a graph title on top. 


Daily Sales Trends 


The graph below shows sales data for Product A in the month of January 2010. 
There were a lot of ups and downs in the number of units sold. The average 
number of units sold was around 5000. The highest sales were recorded on the 
27th January, nearly 7000 units sold. 
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Using graph templates 


We may often find ourselves using similar code repetitively to plot similar kinds of data or 
different versions of the same dataset. Once we have analyzed our data and are looking to 
produce a finished graph, it can be useful to quickly try out different color combinations and 
other aesthetic settings without having to write too much repetitive code. In this recipe, we 
will learn how to create graph templates and use them to quickly try out various "looks" for 
a graph. 


Getting ready 


We are only using the base graphics functions for this recipe. So, just open up the R prompt 
and type the following code. We will use the t h e me s. c s v file which contains theme 
parameters for this recipe. So let's first load it: 

t hemes <- read, c s v (" t h e me s. c s v 11 ) 



We will make a simple scatter plot showing a random normal distribution, and apply different 
color combination themes to it with a single command: 

the me pi ot<-functi on(x, the me, . . . ) { 
i <- whi ch( t hemes$t heme ==t heme) 
par( bg=as. c h a r a c t e r (t h e me s [ i , ]$bg_col or), I as =1) 

pi ot(x,t ype="n",...) 

u<- par ( 11 us r") 

pi ot col =as. c h a r a c t e r (t h e me s [ i , ]$pl ot_col or) 
r ect (u[ 1], u[ 3] , u [ 2 ], u[ 4] , c ol =pl ot col , border =p I otcol ) 

poi nt s (x, col =as. character! t h e me s [ i , ] $ s y mb o I col or ), …） 
box() 
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Using this function, we can create a scatter plot using different themes such as the following: 
t hemepI ot ( r nor m( 1 0 0 0 ), t heme =" whi t e", pch=21, mai n =" Whi t e") 


White 



t hemepl ot (r nor m( 10 00 ), t heme = " I i ght gr ay", pch=21, mai n=" Li ght Gray") 


— [ 238 } 
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Dark 
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t hemepl ot ( r nor m( 1 0 0 0 ), t heme = " pi nk", pc h=21, mai n=" Fi nk") 


Pink 




In the preceding example, we created a function called t h e me p I o t ( ) , which used 
pre-defined color combinations from the t h e me s. c s v file to create different themed graphs. 

We first read the t h e me s. c s v file into a data frame called themes, which contains 
four columns: 

► t heme (name of the theme) 

► bg_col or (figure background color) 

► pi ot_col or (plot region color) 

► symbol _col or (color of plotting symbol) 

We then created the t h e me p I ot ( ) function which accepts the plotting variable x and 
t h e me as arguments. The trailing". . . " means that additional arguments can be passed 
which are passed on to the specified functions within the t h e me p I o t ( ) function definition, 
t h e me p I ot () uses the whi c h() function to find the row index of the specified theme and 
then uses the corresponding column values to set the figure background color in par ( ) , the 
plot region color in r ec t ( ) , and symbol color in po i nt s (). 


H240} 













I \n(fami I y=\"sans\", f ont=l)", 
nt =1, adj =0) 

I Bol d \ n( f ami I y=\ 11 sans \ 11 , font =2) 11 , 
nt =2, adj =0) 

I Italic \ n( f ami I y=\ 11 sans\", font =3)", 
nt =3, adj =0) 

Bold Italic \ n( f ami I y=\" sans \", font =4)", 
nt =4, adj =0) 


maI n =" Font s under Wi naows", axes=FALSE, xl ab = " 
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Once the function is defined, all we have to do to try out different color combinations is pass 
the t he me argument to t h e me p I o t ( ) . If we wish to modify the color combinations or add 
new themes we can simply edit the t h e me s. c s v file and re-read it. We can also adjust 
the function definition so that we can pass the color values separately to override the 
theme specifications. 


There's more... 


In the example, we chose some very simple color parameters to demonstrate the usefulness 
of themes. However, we could easily add more columns to the themes definitions, such as 
symbol types, sizes, line types and colors, fonts, grid line styles, legend styles, and so on. It is 
best to work with your own dataset and define themes as you go along and have a better idea 
of what your specific requirements are. Once you have the structure of the graph decided, you 
can define various themes to quickly experiment and choose from. 


Choosing font families and styles under 
Windows, Mac OS X, and Linux 


In this recipe we will see how to choose font families and styles under the three most popular 
operating systems, namely, Windows, Mac OS X, and Linux. 


Getting ready 


We are only using base graphics functions for this recipe. So, just open up the R prompt and 
type the following code. You may wish to save the code as an R script for later use. 


How to do it... 


Let's look at all the basic default fonts available under Windows: 




3 0 3 0 3 0 —— c 
- f_ -- f_ -- f_ — 3 f_ — 

AM H AM H AA H r = 

M s M s M s AM s 


oo s 4— s Au s Au s 

I^ - MM ._- M I^ - M 6 n 

>11 I II >11 f II 

o yo yo yo y 

L — . - L — • - L ― . - L — • - 

xmxmxmxm 
G3G363G3 

L — . H — L — . £— L ― . £— L — . H — 


3 I_ ■ M 

m. = 

- - hu 

r o 3 

3 —— —— 
Di Di y 


text ( 7 0, 1 8 0, "Ti mes \ n (f a mi I y =\" s e r i f \", f ont =1) __ 
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er New\ n( f ami I y=\" mono \", f ont =1 )", 

=1, a dj =0 ) 

e r New Bold \ n( f ami I y =\" mono \", font =2)", 
=2, a dj =0) 

e r New Italic \ n( f ami I y=\" mono \", font =3) 
=3, a dj =0) 

er New Bold Italic \ n( f ami I y=\" mono \", 

=4, adj =0) 


adj =0) 

__Ti mes Bol d \ n (f a mi I y =\ __ s er i f \ , f ont =2) 11 , 
f", f ont =2, adj =0) 

"Ti mes Italic \ n (f a mi I y =\" s e r i f \", font =3) __ 
f", f ont =3, adj =0) 

Ti mes Bold Italic \ n (f a mi I y =\ 11 s e r i f \", font 
f", f ont =4, adj =0) 



Fonts under Windows 

A rial 

(family="sans",font=1) 

Times 

(family="serif', font=l) 

Courier New 

(family =,, mono " , f ont=l) 

Arial Bold 

(family="sans M , font=2) 

Times Bold 

(fainily=" seiif ' , font=2) 

Courier New Bold 
( f amily="mono ”， font=2) 

Arial Italic 

(family="sans" font=3) 

Times Italic 

(family= "serif*, font=3) 

Courier New Italic 
(family= n mono " , font=3) 

Arial Bold Italic 
(family="sans", font=4) 

Times Bold Italic 
(family^'serif", font=4) 

Courier New Bold Italic 
(f&mily= "mono " f fon t=4) 


How it works... 


In the example, we demonstrated all the combinations of the basic font faces and families 
available in R under Windows. Fonts are specified in R by choosing a font family and a font 
face. There are three main font families: sans, serif, and mono, which are mapped on to 
specific fonts under different operating systems. As shown in the example, under Windows 
sans maps to Arial, serif to Times New Roman and mono to Courier New. The font family is 
specified by the f a mi I y argument, which can be passed to the t ext ( ) function (as in the 
example) or in pa r ( ) (thusapplied to all text in the plot), mt ext ( ) , and t i 11 e(). 
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The font face can take four basic values denoted by the numbers 1 to 4 ， which stand for 
regular, bold, italic, and bold italic respectively. The default value of font is 1. Note that font 
only applies to text inside the plot area. To set the font face for axis annotations, labels and 
the plot title, we need to use font, ax i s ， font. I ab, and font, mai n respectively. 

In the example, we created a plot area with X and Y co-ordinates running from 0 to 2 0 0 
each, but suppressed drawing of any axes or annotations. Then we used the t ext () 
function to draw text labels showing the 12 combinations of the three font families 
and four font faces. 


There's more... 


As you may have noticed, we did not specify the names of the font families in the 
t ext () command. Instead we used the keywords sans ,ser i f, and mono to refer to the 
corresponding default fonts under Windows. We can check these font family mappings by 
running the wi ndows Fonts)) command at the R prompt, which lists the names of the fonts 
for each of the font families. We can also add new mappings using this function. For example, 
to add the font Georgia we need to run: 

wi ndows Font s ( GE = wi ndowsFont ( 11 Geor gi a")) 

Then we can just set family to " G E" to use the Georgia font: 

text ( 1 5 0,8 0, 11 Geor gi a", f a mi I y =" GE M ) 

Just like under Windows, there are default font families under Mac OS X and Linux. The serif 
and mono fonts are the same as in Windows. However the sans font is usually Helvetica. To 
check the default font mappings and add new font families, we need to use the XI1F o nt s () 
and quartzFontsf ) functions under Linux and OS X respectively. 


See also 


In the next recipe we will see how to use additional font families available for vector formats 
such as PDF and PS. 


Choosing fonts for PostScripts and PDFs 


The p d f and postscript graphic devices in R have special functions that handle the 
translation of an R graphics font family name to a PostScript or PDF file. In this recipe, we will 
see how to choose the fonts for these vector formats. 


Getting ready 


We are only using the base graphics functions for this recipe. So, just open up the R prompt and 
type the code we are about to see. You may wish to save the code as an R script for later use. 
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Finalizing graphs for publications and presentations 



Let's create a PDF of an r nor m( ) graph with the title and axis annotations in the font 
Avant Garde: 

pdf ( "fonts, pdf", f ami I y=" Avant Gar de") 

plot(rnor m( 100), ma in =" Random Normal Distribution") 

dev. of f () 


To save the same graph as a PostScript file，we can do: 

post sc r i p t ( 11 f o n t s. ps 11 , f a mi I y=" Avant Garde") 
pi ot(rnorm(100), mai n=" Random Normal Distribution") 
dev. of f () 



As shown in the examples, the font family for a PDF or PostScript output is set exactly the 
same way as in the previous recipe，by using the f a mi I y argument. In the examples, we 
passed the f a mi I y argument to the pdf ( ) and postscri pt ( ) functions since they open 
the relevant graphics devices. 

Note that we used a font family which was not available in the basic R graphics device. We 
can also use the default values s a n s ， s e r i f ， and mono, which are mapped to Helvetica, 
Times New Roman, and Courier New respectively. The p d f and postscri pt devices have 
inbuilt mappings to a lot of font families. To see all the available fonts, we can use the 
pdf F ont s ( ) command. Running pdf F ont s () at the R prompt lists all the names of the font 
families and related attributes (metrics, encoding, and class). To list just the names of all font 
families we can run: 

names(pdfFonts()) 


That gives the following output at the R prompt: 


[1] 

"serif" 

[4] 

11 Avant Gar de" 

[ 7] 

11 Hel vet i ca" 

[10] 

11 Pal ati no" 

[13] 

"URWBookman" 

[16] 

"URWHel vet i ca 

[19] 

"URWPal 1 adi o 11 

[22] 

11 j a pan 1" 

[25] 

11 J apanlRyumi n 

[28] 

11 CNS1" 


sans 

B o o k ma n" 

Helvetica-Narrow" 
Ti mes" 

Ni mb us Mon" 

Ni mbusSanCond" 

Ni mb us Rom" 

J apanlHei Mi n" 

Koreal" 

GB1" 


mono 

Couri er" 

NewCent urySchool book 
URWGot hi c" 

Ni mbusSan 11 
Cent urySch" 

URWTi mes" 

J apanlGot hi cBBB" 

K o r e a 1 d e b 11 


We can check the default mapping to sans by running pdf Font s( ) $sans at the R prompt. 













Chapter 10 


There's more... 


The postscript device has two extra fonts: Computer Modern and Computer Modern 
Italic (you can check this by running names(postscriptFonts()) at the R prompt). 
Just like the commands for specific operating systems, we can use pdf F ont s () and 
postscri ptFonts() to add new font mappings for the p d f and postscript devices 
respectively. Please refer to the help section to see some examples of such mappings 
(? postscri pt F ont s () and ? pdf F o nt s ()). 
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Index 


A 

abline() function 

about 85, 102, 105,115, 184 
using 104 

aggregate() function 

about 117 
arguments 118 
using 118 

ann argument 157 
annotations 

adding, to graphs 229, 230 
fonts, setting for 54, 55 

args.legend argument 125 
arrows() function 80,136 
as.Date() function 

about 13, 110 
arguments 110 
using 116 

at argument 101 
auto.key argument 72 
axes 

data density, displaying on 91, 92 

axes argument 157 
axis annotations 

about 50 
adjusting 63, 64 
colors, setting for 50, 51 

axis() function 

about 174,184, 190 
using 51， 64 

axis labels 

about 50 

annotating, in human readable time formats 
113, 114 

colors, setting for 50, 51 


B 

background color 

setting, for plot 48 
bar charts 

about 16, 123 
creating 14-16 

creating, with multiple factor variable 124, 
125 

creating, with vertical error bars 135-137 
exact values, displaying on 132, 133 
example 16 

styles, adjusting for bars 130, 131 

bar charts, creating 

about 14-16 

with multiple factor variable 124, 125 
with vertical error bars 135-137 
barplot() function 
about 46, 124, 125, 133, 157 
using 14 
bars 

colors, setting for 44-46 

beside argument 16,17,125,127 
bg argument 

about 48 
using 48 

bin size 

setting 148, 149 

border argument 131, 151 
boxcol argument 170 
box() command 

using 62 

boxfill argument 170 
boxlty argument 170 
boxlwd argument 170 
boxplotQ function 20-22, 46,160 



box plots 

about 20 
creating 20, 21 

creating, with horizontal boxes 167, 168 
creating, with narrow boxes 160, 161 
creating, with notches 165, 166 
observations, displaying on 172, 174 
outliers, excluding from 166, 167 
styles, changing for 169, 170 
whiskers, adjusting for 170, 172 
box plots, creating 
about 20, 21 

with horizontal boxes 167, 168 
with narrow boxes 160, 161 
with notches 165, 166 

box styles 
selecting 60, 61 

boxwex argument 161 
box widths 

varying, by number of observations 164, 165 

breaks 

setting, between bars 148, 149 

breaks argument 149,150 
brewer.pal() function 54 
bty argument 61 

c 

Cairo package 228 
CairoSVG() command 228 
calendarHeat() function 199 
calendar heat maps 

creating 199 

calendar.plot() function 203 
cex argument 57 
clockwise argument 141 
clockwise-ordered slices 

pie chart, creating with 139-141 

closely packed data points 

distinguishing，jitter() function used 82-84 

cm.colors() palette 53 
col argument 

about 15, 35, 46, 101， 151 
example 44 
working 46 

col.axis argument 51 
col.lab argument 51 


collapse argument 236 
col.main argument 51 
color combinations 

selecting 52 

colormodel argument 229 
colors, setting 

for axis annotations 50, 51 
for axis labels 50, 51 
for bars 44-46 
for lines 44-46 
for points 44-46 
for text elements 50, 51 
Comprehensive R Archive Network. See CRAN 
contour() function 192 
contour plots 
creating 192, 193 
correlation heat maps 
creating 185-187 

creating, image() function used 26, 27 
working 187 

correlation matrix 

creating, pairs plot used 78, 79 

count frequencies 

distributions, visualising as 146, 147 

CRAN 8 

customized legends 

adding, for multiple line graphs 96-98 

D 

data 

plotting, on Google maps 215-218 
plotting, with varying time averaging periods 
117, 118 

data density 

displaying, on axes 91， 92 

data() function 11 
data 一 matrix 184 
data points 

grouping, within scatter plots 70-72 
labelling 75-77 

dataset 

functions of variables, plotting 107, 108 
non-linear model curves, adding to 85, 86 

densityQ function 19, 153 
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density line 

overlaying, over histograms 152, 153 

density plots 
creating 18-20 
desc expression 235 
destfile argument 217 
dev.off() function 228 
dimensions 
adjusting 66, 67 
display.brewer.pal() command 
using 54 

distributions, for histograms 

visualising, as count frequencies 146, 147 
visualising, as probability densities 146, 147 

dotchart() function 138,139 
dot charts 

modifying, by grouping variables 137,138 

E 

error bars 

adding 79-81 

ESRI shapefiles 

about 220 
working with 221 

exact values 

dispalying, on bar chart 132, 133 

expressionQ function 109, 230, 235 


family argument 242 
filled.contour() function 195 
filled contour plots 

creating 194-197 

fin argument 66 
findlnterval() function 209, 212 
fonts, selecting 

for PDFs 243-245 
for PostScripts 243-245 
under Linux 241 
under Mac OS X 241 
under Windows 241， 242 
fonts, setting 
for annotations 54, 55 
for titles 54, 55 


format() function 118 
formatted date values 

plotting，on X axis 112 

formatted time values 
plotting，on X axis 112 

frac() function 231 
freq argument 147 

G 

GADM database 

about 213 

URL, for downloading 213 

get.hist.quote() function 120 
GetMap() function 217 
GetMap.OSM() function 

URL 218 

getSymbols() function 120 
ggplot2 package 70 
global data 

plotting, by countries 206-209 

Google Maps 

data, plotting on 215-218 

graph margins 

adjusting 66, 67 

graphs 

annotations, adding to 229, 230 
creating, with maps 37, 38 
creating, with regional maps 210-212 
horizontal grid lines, adding to 102 
legends, adding to 33-36 
saving, as image file format 40, 41 
saving, in high resolution image formats 
224-226 

saving, in vector formats 227, 228 
text descriptions, adding to 234, 235 
vertical grid lines, adding to 102 

graph templates 

about 224, 237 
using 238-240 

grepl() function 209 
grid() function 102,151 
grouped data points, highlighting 

by size 73-75 
by symbol type 73-75 




H 

heat.colors() function 

about 46, 187 
example 46, 47 

heat.colors() palette 53 
heatmap() function 

about 26, 184 
using 25 
heat maps 
about 24, 26,181 
calendar heat maps 199 
correlation heat maps 185 
creating 25, 26 
example 26, 27 

with single Z variable along X and Y axes 182 

heat maps, of single Z variable with scale 

creating 182-184 

height argument 226 
heights argument 157 
high resolution image formats 

graphs, saving into 224-226 

hist() function 

about 46 
using 146, 147 
histograms 
about 146 
creating 18-20 

drawing, in margins of bivariate scatter plot 
155-157 

embedding, in another kind of graph 
153-155 

kernel density lines, imposing on 152, 153 

histogram styles 
adjusting 150, 151 

Hmisc package 82 
horiz argument 15, 98, 129, 130 
horizontal argument 168 
horizontal bars 

orientation, adjusting 128, 129 

horizontal boxes 

box plots, creating with 167, 168 

horizontal error bars 

drawing 80 

horizontal grid lines 

adding, to graphs 102 


human readable time formats 

axis labels, annotating in 113, 114 

I 

image file format 

graphs, saving as 40, 41 

image() function 

about 184, 188, 209 

correlation heat map, creating 26, 27 

installation, RgoogleMaps package 215 
integral() function 231 


jitter() function 

about 82, 84 

closely packed data points, distinguishing 
82-84 

K 

Keyhole Markup Language See KML data 
KML data 

creating 219 
reading 219 


labels 

placing, inside bars 134, 135 

labels argument 139 
las argument 

about 161 
using 64 

lattice library 214 
lattice package 70 
layout() function 

about 184, 209 
arguments 157 
using 156 
legend() function 

about 35, 125, 128, 143 
arguments 97, 98 
using 97 
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legends 

adding, to graphs 33-36 
adding, to pie chart 143, 144 
formatting 33-36 

length() function 47 
library() command 37 
line argument 101 
linear model lines 

adding 84, 85 
line graphs 

about 12, 96 
creating 12, 13 
example 235, 236 

lines 

colors, setting for 44-46 

lines() function 13, 46, 153 
line styles 

selecting 58-60 

Linux 

fonts, selecting under 241 

Imfit object 85 
lm() function 85 
log argument 65 
log axes 

formatting 65 

lower.panel argument 79 
lowess 

about 86 
adding 87 

lowess() function 87 
L-shaped box 
drawing, for plot area 62 

Iwd argument 60, 98 
lyt argument 98 

M 

Mac OS X 

fonts, selecting under 241 

main argument 11 
maps 

graphs, creating with 37, 38 

maps package 206 
mar argument 128,157 
margin labels 

using, for multiple line graphs 99-101 


marker lines, adding 
at X axis 104, 105 
at Y axis 104,105 

matchcol() function 212 
matplot() function 99 
medbg argument 170 
medcex argument 170 
medcol argument 170 
medlty argument 170 
medlwd argument 170 
medpch argument 170 
melt() function 138 
metals concentration box plot 
drawing, with horizontal bars 168 
mtext() function 101, 235, 236 
multiple line graphs 
customized legends, adding for 96-98 
margin labels, using for 99, 100, 101 
multiple plot matrix layouts 
creating 30, 31 
example 32, 33 
multivariate data 

summarizing, in heat map 187-190 

N 

narrow boxes 

box plots, creating with 160, 161 

ncol argument 98 
nls() function 86 
non-linear model curves 

adding, to dataset 85, 86 

notches 

box plots, creating with 165, 166 

nx argument 103 
ny argument 103 

o 

observations 

displaying, on box plots 172, 174 

oma argument 62 
omi argument 107 
onefile argument 228 
Open Street Map 

about 218 
URL 218 




order() function 140,190 
orientation, adjusting 

for horizontal bars 128, 129 
for vertical bars 128, 129 

outbg argument 170 
outcex argument 170 
outcol argument 170 
outliers 

excluding, from box plots 166, 167 

outline argument 167 
outlty argument 170 
outlwd argument 170 
outpch argument 170 
outwex argument 170 

p 

pairs() command 28, 79 
pairs plots 

about 27 

correlation matrix, creating 78, 79 
creating 28, 29 

palette() function 
about 52 
working 52 
palettes 
about 52 
selecting 52 
panel.cor function 79 
panel.hist() function 154 
par() command 

about 48,100, 107, 128,151，161， 226 
using 51 
working 31， 48 

paste() function 230, 236 
pch argument 57 
PDF fonts 

viewing 55 

pdfFonts() function 55 
pdf() function 41,228 
PDFs 

fonts, selecting for 243-245 

percentage values 

pie chart, labelling with 141， 142 


pie chart 

about 124, 139 

creating, with clockwise-ordered slices 
139-141 

labelling, with percentages values for each 
slice 141， 142 
legend, adding to 143, 144 

pie() function 
using 140 

pin argument 66 
plot 

background color, setting 48 

plot() command 11 ， 44, 65, 70, 154 
PlotOnStaticMap() function 217, 218 
plotrix package 82 
plotting point symbol styles 

selecting 56, 57 

png() command 40, 226 
points 

colors, setting for 44-46 

points() function 11, 46 107 
polynomial function 
plotting, example 109 
postscriptFonts() function 55 
postscript() function 227 
PostScripts 

fonts, selecting for 243-245 

probability densities 

distributions, visualising as 146, 147 

prob argument 147 
pseudo rainfall data 

creating, for French administrative regions 
214 

plotting, on map 214 

Q 

qplot() function 73 
qqline() function 90 
qqnorm() function 90 
Quantile-Quantile (Q-Q) plots 

making 89, 90 

quantmod package 

about 119, 120 
URL 121 
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R 

R 

annotations, adding to graphs 229, 230 
bar charts, creating 14-16 
bar charts, creating with multiple factor vari¬ 
able 124, 125 

bar charts, creating with vertical error bars 
135-137 

box plots, creating 20, 21 
box plots, creating with narrow boxes 160， 
161 

density plots, creating 18-20 
dot charts, modifying by grouping variables 
137, 138 

exact values, displaying on bar charts 132， 
133 

graphs, creating with maps 37, 38 
graphs, saving in high resolution image 
formats 224-226 

graphs, saving in vector formats 227 
graph templates 237-240 
heat maps, creating 25, 26 
histograms, creating 18-20 
labels, placing inside bars 134, 135 
legend, adding to pie chart 143, 144 
legends, adding to graphs 33-36 
line graphs, creating 12, 13 
multiple plot matrix layouts, creating 30, 31 
orientation, adjusting for horizontal bars 128, 
129 

orientation, adjusting for vertical bars 128， 
129 

outliers, excluding from box plots 166, 167 
pairs plot, creating 28, 29 
scatterplot, creating 9, 10 
stacked bar charts, creating 126, 127 
styles, adjusting for bars 130, 131 
text descriptions, adding to graphs 234, 235 
URLs, for downloading 8 
rainbow() palette 53 
random normal distribution 
plotting 234 
range argument 172 
R base library 44 
rbinomQ function 84 


RColorBrewer package 

about 53, 100, 124,182, 206, 208 
using 53 

read.csv() function 13 
read.shapefile() function 221 
read.table() function 100 
rect() function 49, 240 
regional maps 

graphs, creating with 210-212 

res argument 226 
rgdal package 219 
rgl.material() function 199 
rgl.surface() function 198 
RgoogleMaps package 
about 215 
installing 215 
rnorm() function 19 
rug() function 91， 92 

s 

scale argument 218 
scale() function 190 
scatter plots 

about 8, 69, 70 
creating 9, 10 
creating, example 224, 227 
data points, grouping within 70-72 
making, with smoothed density representation 
93, 94 

saving, in PDF format 227 
saving, in png format 225 
saving, in PS format 227 
saving, in SVG format 227 

scatter plots, saving 

in PDF format 227 
in png format 225 
in PS format 227 
in SVG format 227 

seq() function 116 

smoothed density representation 

scatter plots, making with 93,94 

smoothScatter() function 

about 93 
using 94 





space argument 131 
sparklines 

about 105 
creating 106, 107 

spplot() function 214 
sprintf() function 

about 142 
arguments 142 

sqrt() function 231 
stacked bar charts 

about 126 
benefits 127 
creating 126, 127 

drawing, with horizontal bars 129, 130 
example 127, 128 

staplecol argument 170 
staplelty argument 170 
staplelwd argument 170 
staplewex argument 170 
stock charts 
creating 119, 120 
strptime() function 110 
strwrap() function 236 
styles 

adjusting, for histograms 150, 151 
changing, for box plots 169, 170 

sum() function 231 
svg() function 227 
symbol type 

grouped data points, highlighting 73-75 

T 

terrain.colors() function 195,198 214 
terrain.colors() palette 53 
text argument 101 
text() command 76, 133, 190, 242 
text descriptions 
adding, to graphs 234, 235 
text elements 
colors, setting for 50, 51 
themeplot() function 240 
three-dimensional scatter plots 
preparing 87, 88 
three-dimensional surface plots 
creating 197, 198 


time series data 

formatting, for plotting 109, 110 
summarizing 199-204 

title() function 

about 236 
using 51 

titles 

fonts, setting for 54, 55 

topo.colors palette 53 
tseries package 119 
type argument 13 

u 

upper.panel argument 79 

v 

variable 

grouping over 162, 163 

splitting, at arbitrary intervals 175-177 

varwidth argument 165,172 
vector formats 

graphs, saving in 227, 228 

vertical bars 

orientation, adjusting 128, 129 

vertical error bars 

bar charts, creating with 135-137 
drawing 81 

vertical grid lines 

adding, to graphs 102 

vertical markers 

adding, for indication of specific time events 
115, 116 

visual settings 

for histograms 151 

w 

WDI package 206 
WDIsearch() function 208 
which() function 240 
whiskcol argument 170 
whisklty argument 170 
whisklwd argument 170 



width 

selecting 58-60 

width argument 131,162, 226 
widths argument 157 
Windows 

fonts, selecting under 241， 242 

windowsFonts() command 243 
write.shapefile() function 221 

x 

X axes limits 

adjusting, for plots 22-24 

X axis 

formatted date values, plotting on 112 
formatted time values, plotting on 112 


marker lines, adding at 104, 105 

xaxp argument 63, 64 
xyplot() command 70-72 

Y 

Y axes limits 

adjusting, for plots 22-24 

Y axis 

marker lines, adding at 104, 105 

yaxp argument 63, 64 

z 

zoo() function 111 
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R Graphs Cookbook 


About Packt Publishing 


Packt, pronounced 'packed', published its first book ''Mastering phpMyAdmin for Effective MySQL 
Management'' in April 2004 and subsequently continued to specialize in publishing highly focused 
books on specific technologies and solutions. 

Our books and publications share the experiences of your fellow IT professionals in adapting and 
customizing today's systems, applications, and frameworks. Our solution based books give you the 
knowledge and power to customize the software and technologies you're using to get the job done. 
Packt books are more specific and less general than the IT books you have seen in the past. Our 
unique business model allows us to bring you more focused information, giving you more of what 
you need to know, and less of what you don't. 

Packt is a modern, yet unique publishing company, which focuses on producing quality, cutting- 
edge books for communities of developers, administrators, and newbies alike. For more 
information, please visit our website: www. pa c kt p u b. com. 


About Packt Open Source 


In 2010, Packt launched two new brands, Packt Open Source and Packt Enterprise, in order to 
continue its focus on specialization. This book is part of the Packt Open Source brand, home 
to books published on software built around Open Source licences, and offering information to 
anybody from advanced developers to budding web designers. The Open Source brand also runs 
Packt's Open Source Royalty Scheme, by which Packt gives a royalty to each Open Source project 
about whose software a book is sold. 


Writing for Packt 


We welcome all inquiries from people who are interested in authoring. Book proposals should 
be sent to author@packtpub.com. If your book idea is still at an early stage and you would like to 
discuss it first before writing a formal book proposal, contact us; one of our commissioning editors 
will get in touch with you. 

We're not just looking for published authors; if you have strong technical skills but no writing 
experience, our experienced editors can help you develop a writing career, or simply get some 
additional reward for your expertise. 
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Statistical Analysis 
with R 


Take control of your data and produce superior 
statistic*! analyses with R 

Beginner’s Guide 


John M. Quick 


Statistical Analysis with R 

ISBN: 978-1-849512-08-4 Paperback: 376 pages 

Take control of your data and produce superior statistical 
analysis with R. 

1. An easy introduction for people who are new to 
R，with plenty of strong examples for you to work 
through 

2. This book will take you on a journey to learn R as 
the strategist for an ancient Chinese kingdom! 

3. A step by step guide to understand R, its benefits, 
and how to use it to maximize the impact of your 
data analysis 

4. A practical guide to conduct and communicate 
your data analysis with R in the most effective 
manner 



OpenStreetMap 

ISBN: 978-1-847197-50-4 Paperback: 252 pages 
Be your own cartographer 

1. Collect data for the area you want to map with this 
OpenStreetMap book and eBook 

2. Create your own custom maps to print or use 
online following our proven tutorials 

3. Collaborate with other OpenStreetMap 
contributors to improve the map data 

4. Learn how OpenStreetMap works and why 
it's different to other sources of geographical 
information with this professional guide 


Please check www.PacktPub.com for information on our titles 
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Python Geospatial 
Development 

ISBN: 978-1-849511-54-4 Paperback: 508 pages 

Build a complete and sophisticated mapping application 
from scratch using Python tools for GIS development 

1. Build applications for GIS development 
using Python 

2. Analyze and visualize Geo-Spatial data 

3. Comprehensive coverage of key GIS concepts 

4. Recommended best practices for storing spatial 
data in a database 

5. Draw maps, place data points onto a map, and 
interact with maps 



Plone 3 for Education 

ISBN: 978-1-847198-12-9 Paperback: 193 pages 

Break the webmaster bottleneck by empowering 
instructors and staff 

1. Enable instructors and staff to represent courses 
using Plone's built-in content types—news items, 
collections, and events—without writing a single 
line of code 

2. Embed sound and video into your course 
materials, news feeds, or anywhere on your Plone 
site 

3. Written by Erik Rose—member of the Plone 4 and 
5 Framework Teams 

4. Expert guidance on using the best plug-ins so that 
you can get the best out of your site right from the 
beginning 


Please check www.PacktPub.com for information on our titles 







